Bernie Woulda Won: The Statistical Case

Several months ago, I came across a tweet by David Shor, a data scientist at Civis Analytics. The tweet showed the results from a liner regression (more on what that means below), with Shor implying that if Hillary Clinton had higher favorability ratings, she would have defeated Donald Trump in the 2016 Presidential election:

At the time, debate was raging on whether or not Bernie Sanders, had he won the Democratic nomination, would have defeated Trump. The debate was going round in circles, with both sides offering rhetorical arguments ranging from well thought-out and convincing, to utter bad-faith bald assertion claims. The debate has yet to be resolved. Shor’s tweet offered the most compelling evidence I’ve seen yet that, yeah, Bernie very very very probably would have won.
I’ve been half-heartedly meaning to write this up for awhile, but I had also been hoping that people would stop talking about the election. But that hasn’t happened yet, so the time has come for me to walk through the very best evidence we have on whether or not Bernie woulda won.
1. How the statistical model works
I’m not a statistician. Never took a stats class even. A few years ago, I had no idea what a linear regression even was. But I’ve learned a little bit since, and I’m going to try to simply convey what people who don’t know anything about stats need to know to understand this model.
Basically, linear regression looks at relationships. In our case, we’re going to take an outcome (the results of the Presidential election), and examine two variables (GDP growth for the year, and candidates’ favorability ratings) that might have a causal effect on the outcome. We train the model by entering data from previous years. From this, we can tell how closely the variables are related to the outcome, how sure we are of this, and then — if the variables are related to the outcome, and we’re reasonably sure — we can use the model to predict future outcomes.
Here is what the data looks like entered into Microsoft Excel, before the regression is run:

- The “year” column is just the year that the election took place in, just a matter of keeping track of where we are, not a part of the equation
- The “fav gap” column is the difference in favorability ratings between the two general election candidates. “Favorability” is simply when a poll asks something like, “how favorably do you view candidate X?” Favorability rating is then the percentage of people that view a candidate favorably, minus the precentage that doesn’t. The reason that the model only goes back to 1980 is that’s the year when polls starting asking this question enough for reliable data to be gathered. The data is taken from a 538 article.
The reason we’re using this variable is because… well, it makes, sense, doesn’t it? It makes logical sense that how favorably the public views each candidate will have an effect on the outcome of the election. It should go without saying that what determines whether or not the public views a candidate favorably is complicated, and outside of the scope of this analysis.
You’ll notice that some of these numbers are negative and some are positive. That’s because the model needs some standardization. We’re using the favorability gap from the point of view of the *incumbent party’s* candidate. (So, for example, in 2016, Hillary Clinton was the incumbent party candidate, since Democrats controlled the White House at the time of the election.) Why exactly we do this will be clearer when we look at the next column.
- The “gdp” column measures GDP growth for the year of the election. This has been a mainstay in political science Presidential election modeling, and is usually a large part of what’s meant when we talk about “the fundamentals.” The idea is that in a year where the economy is growing a lot, the incumbent party has an advantage, because people should be happy with how things are going; and when growth is slow or negative, the incumbent party should theoretically suffer. That’s why things are standardized here as all being from the incumbent party’s perspective.
- The “inc share” column measure the incumbent party’s “share of the two-party vote.” This basically, again, is to standardize things, this time by ignoring third party votes. For example, say that in one year, the incumbent Republican candidate gets 50% of the total popular vote, the Democrat gets 40%, and third parties get 10%. We ignore the 10%, and calculate the incumbent share as 50%/(50% + 40%), or 55.5% of the two-party vote. If I recall, I did all of these calculations by looking at Wikipedia results for each election.
2. How do we know if it’s a good model?
We run the regression. There’s a formula you can use, and do it by hand, or you can just run it in a program. It’s very easy to do this in Excel, though if I remember, you have to download a “data analysis pak.”
When we do, we get results that look like this:

There’s a lot of info here, but the two things to look at right away are the “R square” and the p-values, highlighted here in blue. (Note that my results differ very slightly from Shor’s tweet, maybe because we found our GDP numbers from different sources).
- “R Square”: This basically tells you how much your variables “explain” your outcomes. The closer to 1 this number is, the more fully the outcome is explained. With a one variable regression, this number is essentially the same as “correlation.” Here we see an adjusted R square of 0.94… this is very good, and tells us that the variables almost fully explain the outcome. It’s of course possible for spurious correlations to show up, but it’s a good sign that our variables pass the “common sense” test. It makes *sense* that GDP growth and favorability ratings together would explain the outcome of a presidential election.
- “p-value”: Where R-square is supposed to give us a rough idea of how valid the total model is, p-value is part of how we decide how valid each individual variable is. The lower the p-value, the less likely it is that a variable is influencing the result by mere chance (i.e. not really at all). If the p-value is below a certain threshold, usually 0.05, we conclude that the variable in question is indeed having the measured effect on the outcome; it has “statistical significance.”
We see that, in our model, the p-value for both variables is not only below the 0.05 threshold, but below the even more stringent 0.005 threshold.
3. Using the model to predict
Of course the ultimate test of how good a model is is how good it is at predicting future outcomes. The difference between training a model and using it to predict future outcomes is that when you train it, you generate the variable coefficients (seen in the readout above) that you plug into an equation, and when you predict, you plug those coefficients into the equation.
Here, the model is trained on 9 elections, and can be used to predict the 2016 elections. But by cheating a little bit, we can use the coefficients that the model generates, and go back to “predict” those 9 elections. What we do is take the coefficients that the model generates, multiply them by their respective variant value, and then add them all together with what’s called “the intercept.” Here are the results of the “predictions,” including 2016, vs. what actually happened:

For a comparison, let’s turn to a 2012 article by Nate Silver that looks at how good various Presidential election models have been at predicting actual results: https://fivethirtyeight.com/features/models-based-on-fundamentals-have-failed-at-predicting-presidential-elections/
We see that our model compares extremely favorably to the professional models that Silver looks at. There are several ways of comparing them. For example, note that, in the past 10 elections, our model calls 8 of them within 1 point; only 9 of the 58 total models that Silver look at call the vote share within 1 point on any given election. The worst our model does is miss a result by 1.8 points; the worst those models do is miss a result by 10 points.
The point is, it’s a very good model. It calls the 2016 vote *exactly dead-on*. The model has Clinton winning 51.1% of the vote share, while Wikipedia has Clinton winning 51.1% of the vote share (remember, we’re talking 2-party vote share, so 48.2/(48.2+46.1)).
This, of course, is the popular vote share, and as everyone knows, Clinton won the popular vote while losing the Electoral College. What would it have taken to win the Electoral College? We could probably safely say that a 1% increase in popular vote share would have done it, since she lost by less than that margin in the crucial states that flipped for Trump. A 5 point favorability advantage translates into almost a full percentage point in the popular vote (thus the 0.188 coefficient for “favgap” in the regression results.)
4. So where does Bernie fit in here?
We saw that, according to the model, Hillary would have won if she had just 5 more favorability points. Of course, we also saw that, while the model was perfect in predicting this year, it can be a bit off in other years. The average simple error is 0.67 points. Let’s go with the “standard error,” though, about 1.1 points, and let’s point it in the cautious direction. Recalling that, for this model, five favorability points translate to one popular vote point, that means that somebody who ended up with 10 more favorability points than Hillary would have almost certainly won the election, even accounting for error in the model.
We know what Bernie’s favorability is at now, and we know what it was at at the end of the primary; we don’t know what it *would have been* had he been in a general election with Donald Trump. However, rather than throwing around unquantified claims (“The GOP would have hammered him with the ‘socialist’ label!” “Like they did with Obama?!” “But he calls himself that!” “I know! And people still like him!”), we can take a somewhat scientific approach here.
What we’ll want to do is to look at how candidates’ favorability ratings change from the end of the primary race to the end of the general election. Compiled here is just that data, according to 538 (here and here):

There are a few things to note here. First, most candidates, including both Trump and Clinton, *gained* favorability from the late primaries to the late general election. 15 of 19 candidates gained points. The overall average, including those few who lost points, was an 8.4 point gain. Of those who lost points, the *worst* performance was Walter Mondale in 1984, who lost 10 points from primary to general election.
Treating Sanders as a statistical cog, it would be reasonable to assume that he would have gained 8.4 favorability points in a general election. After all, all of the unquantifiable arguments apply to all other candidates throughout history. They’re all attacked and they all fight back. Donald Trump himself gained 16.4 favorability points, while still ending up with the lowest favorability rating by a large margin.
Looking at Sanders’ actual favorability numbers, he went from +10 in late April, to +15.8 in late September, a gain of 5.8 points, which would put him a few points below the average. Of course, we can’t know for sure how he would have fared in a general election, but this might be a reasonable number to use. Or it might be reasonable to use the average of all candidates, +8.4.
But let’s be really cautious, and use the lowest estimate we can, which is Mondale’s -10. It’s possible that Sanders could have lost even more than this, of course, just like it’s possible that he could have gained more than the 25.2 points that Al Gore did in 2000. But using the lowest number from our dataset seems like a reasonable way to approach it, if we’re being very cautious.
So we might start with Bernie’s late April rating of +10, subtract our very cautious -10 from that, and end up with zero. We’ll say that Bernie Sanders would have had an even zero favorability rating, had he been in the general election.
Recall that we worked this out. Even accounting for error in the model, we figured that somebody would have needed about 10 more favorability points than Hillary had in order to confidently beat Trump. Since Hillary’s favorability ended up at -13.8, hypothetical Bernie’s 0 is 13.8 points higher than this. Which means that, even in this very cautious scenario, Bernie woulda won.
But just for fun, or torture, let’s try it a different way. Let’s take Bernie’s *actual* favorability in late Sept., and run it through the model, taking the model at face value this time. What do we get? Now Bernie gets 56.7% of the two-party vote, a legitimate blow-out, the biggest one since Reagan trounced Mondale.
