The Least Predictable Counties in the United States

Is Election Forecasting Harder in Different Parts of the Country?

The difference between a demographic prediction of the election and the actual election results. Color represent an over/under prediction for Democrat Hillary Clinton (where blue is an over-prediction, and red is an under-prediction)

It’s no secret that predictions failed this election — but what if we look at things in hindsight? Could it be that our predictions were wrong because some areas of the countries are simply more difficult to predict?

After our state-wide predictions failed this election, we were tasked with answer a slew of questions: what went wrong? Could we have done better with the data we had? Was Donald Trump just an undetectable super-candidate?

Although it’s certainly true that we could have had better polling data — indeed, state polling was off by anywhere from 1 to 15 percent and national polling will end up missing by about 1 percent — we pollsters/forecasters/prognosticators should evaluate inherent differences in county.

We can do so by estimating Hillary Clinton’s county-by-county vote shares based on the demographics in those counties. IE: If a county has a higher percentage of college educated voters, was it more likely to vote for the former secretary? Does this also hold true for ethnic characteristics and income? Well, yeah — these facts are evident in both my Wisconsin Fraud Analysis and FiveThirtyEight’s post-election analysis penned by Nate Silver.

Yet, we need the particular data — aided by graphical representations — to really fill in the gaps in our knowledge:

Predicting Clinton’s vote share by county percent “educated”

Scatterplot 1: Predicting Clinton’s Vote Share by County Percent Holding a Bachelor’s Degree

We get a hint at the unpredictability of America when attempting to forecast Clinton’s vote share by using each county’s reported percentage of college graduates. The plot above, colored by the error made in the prediction, highlights that holding a Bachelor’s degree *does* make you significantly more likely to vote for Hillary Clinton — but that doesn’t get us very far. Points still are spread all around the plot, telling us that the predictions aren’t that great. For those more statistically inclined, the R-Squared of this trend line is 0.28 — essentially, knowing a county’s “educated” population tells us about one-third of this election’s story.

Predicting Clinton’s vote share by county percent white

Scatterplot 1: Predicting Clinton’s Vote Share by County Percent White

Attempting to predict Clinton’s vote share based on a county’s white population tells us more than the education of the county. In fact, with an R-Squared of .485, it tells us about twice as more. Still, we’re missing half the story of the electorate.

But here’s an idea: What if we predict Clinton’s county-by-county vote shares by percent white and percent college educated?

Predicting Clinton’s vote share by county percentage white and percentage college educated

As displayed, when we combine our two factors — county percent college educated and white, the predictability of the election increases drastically. Here, our R-Squared is a very good .705.

But, 30 percent of the election is still deviation from our expectations. What can we do to fix that?

Realistically? We can conduct pre-election polls — but there is absolutely no way to get a 99.9% accurate prediction. America is just too different.

Take our US Predictability map (above) again — colored by our demographic model’s accuracy, the map shows that Clinton over performed her expected vote share in orange counties (the Midwest and West Coast, mainly) and did poorer in the blue counties (the Western great plains and most of Texas).

You’ll notice that there aren’t many light gray counties — counties where white votes and college-educated voters cast ballots as expected. This means that there is an untold story in America — the story of differences among Demographic groups. Our analysis shows explicitly that all white voters are not the same, and, by proxy, that all college educated voters are not the same.

Put simply, America is not very predictable. But, why? Possible explanations:

That’s a harder question to answer, and I can only posit a few explanations. Nevertheless, here are a few explanations for the great amount of varying error in our nation-wide demographic forecast.

  1. White voters in the north are more democratic than those elsewhere. In the Midwest, for example, the orange shading of many counties indicated that Clinton over-performed her demographic prediction. Due to the high concentration of white voters in this area, a logical extrapolations says that white voters here are more Democratic than those in, say, West Virginia or rural New York.

2. Rural America is more conservative than demographics indicate. As the age-old story goes, rural America is becoming much more conservative as time goes on. This is evident in a NYT Upshot article released in middle November — indeed, rural America has been growing more red over the years. Do Republicans speak more to these voters on gun rights? Farm subsidies?

3. Mormon voters are unpredictable. I would be remiss not to mention the elephant in the room: Utah and Idaho. With many Mormon elected officials and community leaders denouncing their Republican nominee — the usual party identification of Mormon voters— and endorsing the center-right Evan McMullin, it’s very possible that some would-be Democrats also fled Secretary Clinton.

4. Left-leaning Independents left the Democrats. Much of the map’s remaining blue is coming from NM and CO — states that voted favorably for third-party candidate Gary Johnson. New Mexico probably did so because it is Governor Johnson’s home state. Colorado? Well, other reasons (legal marijuana, possible singple-payer healthcare) could hint that voters that would otherwise be Democrats left the party for a more enticing third party option.

These are just a few explanations, and we can’t be sure what caused the error. However…

What we do know now is that even our fancy statistical models don’t produce hyper-reliable estimates of candidate performance in United States elections. Intra-group differences are pervasive and often unpredictable. Although we could theoretically control for regional voting differences among white and college-educated voters, these solutions are likely to be ad-hoc and unimpressive. Our best estimate is that standard demographic analysis tells us roughly 70 percent of the story this election.

The rest, it seems, is unpredictable — and utterly indeterminable without other sources of data.

Read more analysis like this one on the official website of @TheCrosstab,