Ask Not, “Will it Snow?”; ask “What’s the Distribution of Your Forecast?
A lot of New Yorkers are upset this morning because the 18–24 inches of snow or more that had been forecast did not happen. Instead, New York seems to have gotten a meager 5–6 inches of snow. Not worth shutting the subway over, let alone canceling everything that got cancelled—including Louis CK’s Madison Square Garden show.
Well, that’s a bummer, indeed, not having life-threatening conditions blanket your region in a wall of (admittedly very pretty) snow and ice.
But remember what they say about hindsight. Public officials trying to decide how to react to a coming storm have to weigh two important, related factors that we, too, weigh every day in most of our decisions, when we make decisions about the future, or even to get out of the house in the morning:
How to react to the false positives and false negatives that lurk everywhere and what’s the distribution of the forecast we are considering.
Wait, wait, you say, all I want to know is: Will it Snow? And How Much? And that’s what forecasts give us:
Here’s what happened instead:
What went wrong? Nothing, actually. The problem is what we are given as a forecast (an expected amount of snow) is woefully inadequate to understand what it takes to make an informed decision. Presumably public officials understand this (I hope) and have access to more than that—but let’s run down the moving parts and discuss why.
For the moment, assume that your event outcome is binary; i.e that for all practical purposes, it will happen or not. Take cancer: you either have it, or not. Or pregnancy (for women): it’s true or it’s not. (No need for continuumsplaining folks, I know that even these have tiny gray areas).
Here are your decisions in evaluating a forecast, and whether to test or not, and how to proceed:
1- How bad is a false positive?
A false positive is when you think or are told something will/has happened, when it is not. In other words, what’s the cost of being told you might have cancer, when you do not? That’s the decision facing U.S. Preventive Services Task Force grappling with whether to stop recommending regular mammograms because too many women who are told they might have cancer, and undergo stress and unnecessary biopsies, do not have cancer. (In other wards, false positives). What if false positives result in unnecessary surgery, a major event always with risks? Even hospitalization, which can save your life, has to be weighed against necessity (and possibility of false positive) because it, too, always comes with risks like hospital infections which tend to be very hard to treat.
2- How bad is a false negative?
A false negative is when you are told you are cancer-free, but you actually have cancer, or that you are not pregnant, when you are. For most events worth forecasting, it is considered a bigger problem to have false negatives because if you are trying to forecast something, it’s probably something you really really need to know about. Missing cancer or a pregnancy diagnosis are obviously pretty significant errors with huge downsides.
And here’s the kicker, almost always, you have to risk one or the other. In other words, there is no free lunch in making decisions according to forecasts which almost always come with a level of uncertainty, and thus come with false positives and false negatives. Some of the time, you will think something will happen, and it won’t. Sometimes you think it won’t happen, and it will. To add to the complexity, your rate of false negative and false positive are related: you can make your forecast more sensitive to one, but you are then becoming less sensitive to the other.
Pick your poison, or risk, in other words.
If this appears no fun, it is not. But we live with this, every day. We make such decisions. Do you carry an umbrella when you see the rain clouds? False positive, and you carried an umbrella for no reason. False negative, and you get wet. Do you wait to buy an item that is low on supply, but that you think will go on sale? False positive (it never goes on sale and sells out) and you are without the item at any price. False negative (you didn’t think it would go on sale and bought at full price) and you paid more than you should. On and on…
Most things in the world are not binary: think snow totals. It can range from 0 to, say, 75 inches. Apparently the US record which occurred “Silver Lake, Colorado in 24 hours on April 14–15, 1921”. And forecasts of events that have a range have distributions (implied or not). Weather forecasts (very complex models) also have distributions, and they are a mix of statistical and numerical analysis but here’s where I confess I don’t have much of an idea about how exactly weather forecasts work in detail — but that’s okay because the point about distributions is a general one. (Anyone want to recommend a good book about weather forecasting? Comment here. :-))
Consider two very different probability distributions.
Here’s what the Gaussian (sometimes called the normal) distribution looks like:
There’s a lot of details here but just note these things. In the Gaussian distribution (the first one above) the mid-range value happens to be the average and it occurs a lot.
To understand the importance of distribution, consider a distribution that is not Gaussian at all: you have a universe of, say 10 people, and one of them makes $10,000 dollars a week, while the rest make $100. The average is $1090 (($10000+$900)/10) even though it is not a representative value. What the average means, in other words, depends on your distribution. (The Y or the vertical axis is the probability of that x will be a particular value, denoted as P(x) or probability function of x— it peaks at the value that occurs the most often).
In everyday life, Normal, Gaussian, or the Bell Curve is is how many things in the world are distributed. Think height: most people are neither tall or short, a few are very short, a few are very tall.
But not everything is Gaussian. Look at the exponential distribution (the exact shape depends on the rate parameter but you can ignore that for this discussion). A few extreme values happen fairly rarely and then there is a long long long tail that falls off rather sharply. If height were exponentially distributed, we’d live in a world in which there would be maybe one very very very very tall person, one very very tall person, one very tall person, a few mid-range height people, and lots and lots of fairly short people. (We’d probably change our definition of what we call short or tall but you get the picture).
Many things of consequence in the world do look like the exponential function: this includes the world wide web audiences: a few sites get lots of clicks, most of the rest get few to none. There are a few YouTube stars with millions of views, and they get to interview the president, while your video of your kittens is seen by 127 folks. And transition from one kind of distribution to the other is painful. Income used to be distributed more normally (meaning more Gaussian) in the United States for a period known as the “Great Compression” in which inequality fell. Now it looks more and more like an exponential function, meaning a few people make extreme amounts of money and the amount made by person falls very rapidly, rather than hanging around in the middle to form that nice bulge we like to call “the middle class”.
This all ties back to false negatives, false positive and weather predictions.
When you hear that snowcalopytic forecast, what the forecasters are actually saying is that they’ve run their models and the most likely event is (OMGAMOUNTOFSNOW). That means that OMGAMOUNTOFSNOW is the peak of the probability distribution of their models. Here’s what it means really: if you had a gabazillion earths, and you had the data you had (with the uncertainty you have) AND you let the gabazillion earths run that scenario gabazillion times in alternate universes and times, most of the time you’d get OMGAMOUNTOFSNOW in NYC on the night of January 26, 2015.
But that means there are some universes in which you get #snowperbole instead, in that there is little snow where there was much predicted.
You got a false positive.
But, aha, you are a public official in one of those universes, and it is the morning of January 26, not the evening. You do not know which universe and timeline your city is in, and you have to make a decision. The decision to weigh is this: how likely is a false negative, how likely is a false positive, and what are the harms associated with each?
And that depends on how those gabazillion earths in gabazillion universes are distributed. Is it Gaussian? Exponential? Or something else? (There are many other probability distributions.)
In the end, the city officials in NYC decided that the dangers of a false negative (assuming little snow but get slammed) outweighed the harms of a false positive (much snow predicted, but very little fell). It makes sense to me, to be honest, as lost revenue seems like a small cost compared to lives lost, but obviously, that too, depends on how likely it was that lives would be lost, and lost revenue and events and opportunities, too, is a burden.
And for that, we’d need to know the distribution of the forecasts, and a sense of false positives and negatives (but remember in a continuum, rather than binary so it depends on how much snow you think is too much).
Communicating merely the forecast (the peak of the probability distribution) is far from giving us the full complexity of these decisions. I wish weather forecasts came as charts, instead of single numbers, showing distributions and other pieces of crucial information.
Oh, wait, you say, who would understand all that? And here’s my pitch: we should be teaching probability and statistics in K-12 because these are essential to everyday decisions. I’d even say instead of calculus if necessary — while calculus is necessary to understand the full math behind probability distributions, you can get a reasonable knowledge of how they work without calculus. And while few of us are going to be engineers who need calculus to do their job, all of us deal with probabilistic decision making everyday.
In the end, I think Louis CK, who cancelled his Madison Garden Show had it just right, and you can see in this paragraph he’s wrestling with false positives versus false negatives. Here’s his email to his ticket-holding fans:
The yellow is the forecasting uncertainty, and grappling with alternate universes and the gabazillion earths. The green is the cost of false positive: you cancel, and there is little snow and you lose your deposit (and upset your fans). Red is the universe in which you act like the snow prediction is not likely, don’t cancel, but the snow strands thousands of people under dangerous conditions, like missing a cancer diagnosis with a false negative.
It seems like the right call to me, but if only we also had the distribution.