Bernie was robbed by early registration deadlines and the media
Senator Bernie Sanders could have won the 2016 Democratic presidential primary with just two small changes in the proceedings. If all states had same-day voter registration, and if the media had not prematurely called the nomination for Clinton on June 5th, Bernie most likely would have won. A statistical analysis of the vote share by state, adjusting for a number of demographics, postdicts that the Senator would have won a majority of the pledged delegates with just these two small changes. If superdelegates then behaved as they did in 2008, switching their support to the nominee with more pledged delegates, Bernie would have easily secured the nomination.
Many states have early voter-registration deadlines. Since voter participation in primaries is already quite low, and Sanders drew much of his support from young and first-time voters, these early deadlines were a real barrier to many of his supporters. If all states had same-day registration he would have come very close to tying Clinton in pledged delegates.
The media prematurely called the contest on June 5th, stating that Clinton had “clinched” the nomination the day before California, New Jersey, and several other states would vote. This call was based on an anonymous survey of superdelegates who, if the media are to be believed, responded that they planned to cast their votes for Clinton almost two months later at the Democratic Convention. In fact, if every state had same-day voter registration, that improvement alone probably would have prevented this premature call. So in a sense only one small change, not two, would have been sufficient for Bernie to win.
In fact, my analysis is more conservative than I have stated to this point. Since previous analyses by FiveThirtyEight and The Upshot have claimed to show that Bernie benefited from the “caucus effect,” my postdictions also change every caucus to a primary.
Although my thesis here is that only a small change would have resulted in a different outcome, it helps to step back and think about the bigger picture. Early voter-registration deadlines were not the only structural barrier to participation of Sanders voters. Low voter turnout among young people is driven by other things as well, for example the Republican voter-ID laws in South Carolina and Wisconsin. The media also influenced the outcome in far more ways than just their premature call on June 5th, as we know from leaked DNC emails.
The rest of this post discusses technical aspects of the analysis. Full details are available, including data and code.
No statistical analysis would be honest without some discussion of its most important limitations. Almost all of these are related to collinearity.
- Regression analysis cannot show causal effects without some additional (and usually untestable) assumptions. It is possible that some important effect not in my dataset caused the observed association between voter registration deadlines and the outcome.
- The same-day voter registration variable is highly correlated with the caucus variable. It is essentially impossible to disentangle them. However, my postdictions assume all caucuses are changed to primaries, so any issue due to this correlation should be at least somewhat mitigated.
- The media’s early call effect is highly correlated with the time effect.
I believe these limitations are not fatal. The main conclusion, that Bernie could have won a majority of pledged delegates, holds for all 3 models, including the simplest and most conservative one.
Details: I did a fairly standard kind of statistical analysis (linear regression) using essentially the same models as Nate Silver and Harry Enten at FiveThirtyEight and Nate Cohn at the NYTimes/Upshot. They are journalists and I’m a statistician, so I have more formal statistical training than them and felt obligated to do a lot of diligence. Basically this means I tried various combinations of slightly different models, leaving out possible outliers (Vermont, Washington D.C, Puerto Rico), and checking various model diagnostics and comparisons. The results above are robust to these manipulations, meaning I did not just choose a special model that gave the result I wanted. My data (DemPrimaryData.csv) and code (earlycall.R) are all available on this Github repository.