by Mike Shea on 7 October 2018
I've become a huge statistics nerd over the past few years, much to the dismay of coworkers, friends, and family. Somewhere in all those numbers it feels like we're getting closer to a model for the world and how it works. I love dealing with stats. I don't want to hear stories. I want to hear four hundred stories and figure out what the median and standard deviations look like.
I've also been fascinated by one major aspect to all of these statistical predictions we see every day—uncertainty.
We don't really know the world around us. All the time we're building models that help us explain what we're seeing around us. We have a hypothesis, we gather evidence, and we develop a theory, we filter all of the results through our own cognitive biases. We'll never reach "truth". Newton's law of gravity turned out to be wrong, though it's good enough to get around in the world for our whole lives. We can't really know the world around us perfectly. All we have are our models and how we improve them.
Among all of these models, one of the most powerful and least intuitive is the role of variance. Variance is all around us all the time. Randomness is hugely powerful in our lives, all the time, and yet we are so wrapped around needing causality that we don't pay any attention to it at all.
A lot of us build our models on anecdotes which isn't terrible if you have no other evidence. A single data point can tell you something but it has a wide range of uncertainty. Unfortunately we'll often focus on anecdotes despite evidence, often lots of evidence, to the contrary.
If we can gather a lot of evidence (my go-to is around five hundred pieces of randomly selected evidence) we can build a more accurate model. This is, however, largely impossible because true random selection is likewise impossible. But we can often get closer than a pure guess.
Many times, however, our best estimate, based on the variance is ‾\_(ツ)_/‾ and anything else is likely bullshit.
Around 1970 my mother and father worked at Playboy. My father was an editor there and my mother worked for the advertising department. One day my father and his best friend walked by and saw my mother, a new attractive female employee (this was the 70s, I can only imagine it being a lot like Mad Men). There and then my father and his friend flipped a coin to decide who would ask her out on a date. My father won.
Those were the best odds I've likely had in my life.
Even being born who we are, when we are, to the parents we have starts to get into astronomically small probabilities. This Business Insider article puts it at roughly one times ten to the 2.5 millionth.
The mere fact that you are reading this at all is highly highly unlikely just given the probabilities of humanity, much less the overall universe.
Uncertainty lies all around us. I like to semi-jokingly refer to it as the black sun. The sun that controls all of our lives but is completely invisible to us. It pulls on us as surely as gravity—maybe much more so— and yet we try to both ignore it and fight it all the time.
During the 2016 election, polling and statistics took a big hit. We had an unlikely situation occur and thus everyone pointed at the math geeks and said "you got it wrong!". In fact, they didn't. Near the end of the election, FiveThirtyEight gave Trump roughly a 30% chance of winning. That means that, in one hundred simulated elections, he would have one 30 of them. That's no small number. You wouldn't drive if there was a 30% chance you were killed in a car wreck. You probably wouldn't bet your life savings on a 70% chance of doubling it even though that's a better deal than you'd ever find in Vegas.
The fact is, there is a tremendous amount of uncertainty in the world, and in the statistics about the world. Much of our ability to predict what is going to happen is mired by probabilities and we hate them. We want sure things. We want predictability.
We're not going to get it.
Data visualization professor Edward Tufte has a wonderful term I first saw on Twitter called "insignificant digits". When we're measuring out probabilities, math nerds like to go down to many decimal places, even if there's a high standard deviation. We'll see things like a %42.9472 chance of rain (+ or - %15). If your standard deviation is wide, who the hell cares if you go down to four decimal places? In fact, %42.9472 (+ or - %15) rounded down to its actual significance might as well be ‾\_(ツ)_/‾.
When we see some number down to five decimal places, we put a false sense of certainty in the number. The math and data nerds like to justify the work it takes to build these forecasts but the reality is that ‾\_(ツ)_/‾ is the real answer. We get irate when there's a 42% chance of rain and it rains but we could have just flipped a coin.
When it comes to rounding off insignificant digits, my hat goes off to FiveThirtyEight who runs a whole pile of analytics on hundreds of polls but boils the number down to a simple ratio like 1 in 4 or 2 in 9. Frankly, if that ratio gets to around 1 in 3, you might as well use ‾\_(ツ)_/‾.
Most of the time, in fact, we can use ‾\_(ツ)_/‾ to recognize the uncertainty that lies around us. If it's anywhere between one quarter to three-quarters chance, ‾\_(ツ)_/‾ is a completely appropriate quantitative visualization. You might want some better analytics if you're planning to put money or health behind it but most of the time ‾\_(ツ)_/‾ is a perfectly acceptable statistical measure. We just don't know.
Just as we put too much faith in insignificant digits, we're likewise terrible at understanding the difference between orders of magnitude. We might think something is ‾\_(ツ)_/‾ when it's actually 10x, 100x, or even 1000x more likely on one side than another. Likelihood of getting killed in a car crash compared to a plane crash, for example.
It's just as bad, probably worse, to assume something is ‾\_(ツ)_/‾ when it's actually 10x, 100x, or 1000x in one direction over the other.
Embrace uncertainty. It's all around us and we have very little control over it most of the time. The further out we go from now, the more variables fall into the picture, the more likely ‾\_(ツ)_/‾ is the best estimate we're going to get.
Special note, I had to hack my own blog software to allow for HTML entities in the title just so I could render the shrug emoji. That's what happens when you roll your own blog software. ‾\_(ツ)_/‾