Nate Silver, founder of FiveThirtyEight and poster boy of data-driven election predictions, is on fire. He’s always right, even when he’s wrong.
In 2008, Silver gave Obama a 94% probability of winning. He got it right.
In 2012, Silver gave Obama a 91% probability of winning. He got it right.
In 2016, Silver gave Trump a 28% probability of winning. He got it right.
That’s right. Silver had Clinton at 72%, Trump at 28%. Trump won. And Silver was right?! That’s right.
And I suppose Silver would have also been right if Clinton had won instead? That’s right.
Can he ever be wrong? Do his (or anyone else’s) probabilities mean anything at all?
We’re told that they do. Statistics and data science programs are booming at the top universities; data scientists are a hot commodity in Silicon Valley and on Wall Street; and academic jargon like ‘big data’, ‘machine learning’, and ‘analytics’ has infiltrated our everyday vernacular. Not to mention the hundreds of millions of dollars, perhaps more, of federal taxpayer dollars spent on data-driven research each year.
All of these trends empower modern data zealotry, which casts those dubious of data-driven methods into the same (deplorable) basket as the tin foil hat wearers, moon landing skeptics and illuminati alarmists.
How dare you question something as objective, as cut and dry, as the data, as the statistics!
Based on an argument from the theory of option pricing and stochastic calculus, Nassim Taleb dared to call FiveThirtyEight “quite incompetent (and aggressively so).” But contrary to Taleb’s criticism, Silver isn’t incompetent. He’s just good. Really good.
He’s good precisely because he’s not pricing or selling options or, in Taleb’s parlance, putting any “skin in the game” to back up his predictions. He’s good because he has turned probabilities into a commodity, a consumer good, a media empire. And he’s really good because even when his predictions seem to fail, he doesn’t change a thing. He justifies why they failed — or why they didn’t — with the same logic that led him astray in the first place. And gets away with it.
As if that weren’t enough, Silver also has the public backing of some well-regarded academic statisticians. The Simply Statistics blog shilled for Silver in its November 9 entry: “Statistically speaking, Nate Silver, once again, got it right,” wrote bloggers Rafa Irizarry, Roger Peng and Jeff Leek. (Unfortunately, the Simply Statistics argument seems to be largely based on the assumption that Silver’s credible intervals are 95% when they are in fact 80%. So, statistically speaking, Irizarry, Peng and Leek are wrong. Or, by their same logic, maybe that makes them right, you know, statistically speaking? I digress.)
With no skin in the game, Silver’s forecasts might still be evaluated, not based on dollars and cents, but rather based on how “calibrated” they are. Roughly, a forecaster is well-calibrated if his 10% predictions occur 10% of the time, his 20% predictions occur 20% of the time, and so on. This same idea of calibration is related to Andrew Gelman’s “slightly sophisticated view” of the election outcome and also the admonishment by Simply Statistics, “Remember, if in ten election cycles you call it for someone with a 70% chance, you should get it wrong 3 times.”
By this metric, a well-calibrated forecaster is a credible forecaster since when he forecasts an event at, say, 30%, the event occurs 30% of the time in the long run. Seems reasonable? At the very least, this gives the forecasts some meaning, right?
Not so fast.
In their work on game-theoretic probability, Vladimir Vovk, Akimichi Takemura and Glenn Shafer prove that a forecaster can be well-calibrated, regardless of what happens in reality and without any knowledge of future events, by using a strategy called “defensive forecasting”. To assess calibration, forecasts are compared against certain test functions. For example, the accuracy of 20% predictions is evaluated based on the performance of not just the 20% forecasts but also the 19% forecasts, 18% forecasts, 21% forecasts, and so on, with forecasts closer to 20% given a higher weight. As long as these test functions are continuous — meaning the weights do not change too abruptly as forecasts vary — defensive forecasting ensures calibration in the long run by simply choosing probabilities that correct for past deviations.
In guaranteeing well-calibrated forecasts without any knowledge of future events, defensive forecasting seems to explain how predictions, such as FiveThirtyEight’s 28% Trump probability, can be both “correct” — in terms of calibration — and meaningless — offering no substantive insight about the real world — at the same time.
To overcome this criticism, a forecaster with guts (not Silver) might offer bets based on his predictions. (Silver’s 72% probability for Clinton translates to a bet on Clinton at 0.39 to 1 odds, for instance.) But even “skin in the game” may not be enough to sniff out a charlatan forecaster. In this context, the test functions from the above game-theoretic framework become gambling strategies against the forecaster, and the requirement that these strategies are continuous acts as a constraint against erratic betting behavior. Overall, the Vovk-Takemura-Shafer theory suggests that as long as the gambler is tame — if he’s predictable, if he follows the herd, if he’s a sheep — the forecaster can exploit that tameness without any clue about the likelihood of future outcomes and, in fact, without even knowing what event is being bet on. All the forecaster needs to do is offer bets that prevent the gambler from making money, which is possible against any continuous strategy. To have any chance at making money, the gambler can’t be docile. He needs to be a little wild, a bit irregular, even volatile. Make the forecaster uncomfortable. Throw him off balance.
So, returning to my rude question: Do Silver’s probabilities mean anything at all?
Both common sense and basic mathematical intuition suggest not.
Since ancient times, people have relied on prophets, seers, fortune tellers, and psychics to foretell the future. How naïve and unsophisticated those cultures were! We’re so much more advanced now. We have Nate Silver. Instead of looking into a crystal ball, Silver looks into the data. Instead of consulting with the gods, he consults with mathematical models. He doesn’t sell snake oil, he sells probabilities. His predictions are often misinterpreted, but never wrong.
We’re left with a story comical and tragic all at once: Nate Silver is our version of the Oracle of Delphi, of Nostradamus, of The Amazing Criswell.
Prove me wrong.
Or, even better, prove Silver wrong.