Sanders’ Chances of Winning The Primaries: A Statistical Report (New York Update)
For those who just want the estimations, scroll to the bottom of this article.
Hey S4P! Based on suggestions from my previous post, I have made some changes to my model, namely replacing my linear prediction trend with a voter registration deadline trend, which seems to be yielding better results. Thanks to /u/DriftingSkies for the suggestion! I will also be running one-step-ahead forecast simulations in the near future as per the suggestion of /u/dmichelson_ma.
Now, before I go on with my predictions, I need to address something important regarding Wyoming:
Your Wyoming prediction was off by over 20 points! The result wasn’t even in your confidence intervals! What happened?
Before I answer this question, I first need to draw a distinction between the predictive power of my models, and the explanatory power of a statistical model. Predictive power indicates the ability of a model to guess the outcome of a primary before it happens, and explanatory power indicates the ability of a model to account for an outcome of a primary after it happens. If a model has high explanatory power but low predictive power, that would indicate that the variables driving out-of-sample predictions are off. But if my models have low predictive and explanatory power for a given primary outcome, either the result is an outlier, or the model has no significant value.
Having tried to rerun my models with Wyoming added to my data set, I found that not only were my models unable to predict the outcome of Wyoming, but are unable to explain the outcome of Wyoming either. Unlike with previous caucuses, the outcome of the Wyoming primary could not be explained at all with either demographic or temporal information.
This is even out of line compared to the outcome of other caucuses nearby, including the Idaho caucus right next door! I’m sure other prognosticators, including Tyler Pedigo, who predicted a similar 74.6% for Sanders, are also scratching their heads over the result.
Here are some things that I believe may have contributed to the unexplained outcome of the Wyoming caucus:
- It was a caucus. Caucuses tend to have inherently low turnout due to the required investment of effort to participate, and thus the sample of caucus-goers can often be small and biased relative to the overall population of the state. While this has previously benefited Sanders, since previous caucuses have drawn more passionate Sanders supporters than Clinton supporters, it may be the case that there were fewer passionate supporters that came out to the Wyoming caucus. Which brings me to my next point…
- The Sanders campaign and S4P subreddit overlooked Wyoming. Given the (very well-founded) focus of this subreddit on the upcoming New York primary, phonebanking, canvassing, and GOTC efforts may not have been nearly as strong as they have been in the past for the Wyoming caucus. I can also imagine that the Sanders campaign decided to hone in their resources on New York, leaving Wyoming behind. It could have been the case that many potential Sanders supporters decided to stay home for this reason. Given that my temporal projections are probably contingent on continuing and evenly spread volunteer and campaign efforts, it probably explains the huge under performance for Sanders in Wyoming.
- It’s a small state. There are a few reasons why this can cause issues. Given that the main drivers of my model are Facebook likes and the African American population of a state, Wyoming’s low population — both African American and overall — led to small differences in Facebook likes and African-American population becoming amplified. The chart below illustrates this point; my model seems to perform acceptably for states that do not have an abnormally small population (Wyoming, Vermont, Alaska).
- Absentee ballots. While absentee ballots do not normally lead to such a huge distortion of voting patterns, the small population of the state has probably allowed factors like high levels of surrogate absentee ballot voting — along with many other factors — to affect the primary result significantly.
If I’m completely honest with myself, I didn’t even conceive that Sanders would do as poorly in Wyoming as he had, regardless of what my models predicted. It was not long ago that Sanders absolutely killed in nearby caucuses, namely Utah and Idaho, which are also mountain states. I don’t think that anyone would have believed me if I told them that Sanders would only win Wyoming by 10 points. Either way, I’m willing to call Wyoming an outlier and move on. Now that the caucuses are finally out of the way for the most part, I have more confidence in my predictive models moving forward, especially for larger states like New York.
With that out of the way, here are my predictions for New York, with and without the Wyoming outlier added to my data set:
Note: the “chance” estimations represent what my models predict the range of outcomes of the primary that could occur. e.g. My models – should they be true – predict there is a 50% chance the result will land between 44% and 57%.
Although the upcoming primary is a closed one, the huge registration drive prior to the deadline, the aggressive campaigning by both Clinton and Sanders, and the long lead-up to the primary should make the race very competitive in the state. My results predict that the race will be too close to call quickly.
I am withholding my predictions of how likely it will be for Sanders to win until after the NY primaries, since Wyoming is skewing my predictions.
If you have any questions regarding my report, please let me know!