Forecasting Who Will Win the Presidency in 2020

Vinod Bakthavachalam
Vinod B
Published in
9 min readOct 29, 2020

(See here for the Senate forecast)

The 2020 election is rightly seen as a defining moment in American politics. Biden and Trump each have vastly different visions and plans for America. Who wins the election will reveal which vision voters want.

To understand who is likely to win the presidency, we built two forecasting models: a fundamentals only model and a fundamentals and polls hybrid model. The reason we use both is that on a state by state basis, the fundamental and polls hybrid is more accurate. However, the fundamentals only model has called each election since 2000 correctly while the hybrid model would have missed 2016 (like all other forecasting models; however, note ours gave Republicans greater odds compared to others) as the table below shows.

Furthermore, the fundamental model yields insights into “fundamental” reasons why a state leans one way, such as due to demographics, while the hybrid model adds in current voter preferences.

Building a Presidential Forecasting Model — The Fundamentals

The fundamental model regresses each state’s (including DC) democratic vote share in Presidential elections between 1980 to 2016, denoted Y, on a set of state level features, denoted X, in a mixed effects regression. We add in what are known as random effects for the election year and region the state falls into as defined by the US census to capture national error and regional error respectively.

The model ends up taking the following form:

Y = B * X + National Error + Regional Error + State Error

It predicts the expected democratic vote share in a state (B * X) based on the fundamental factors in the model:

  • The partisan voter index for the president defined as the difference in the Democratic-Republican vote margin in the state compared to the national vote margin across states as a mix of the last two elections (75% and 25%)
  • The state unemployment rate in the month prior to the election
  • The state house price growth rate in the year prior to the election
  • The national GDP growth rate in the year prior to the election
  • The fraction of white residents in a state as of the most recent census

The most important variable by far in this model is the partisan voter index since states tend to vote similarly to how they did in recent elections, especially compared to the national margin.

To incorporate uncertainty, we decompose the estimation error into three components at the national, regional, and state levels. The estimation error tries to account for the historical accuracy of the model and random changes in how people might vote.

The national error captures things like national swings towards one party or another such as in the 1980 election when almost every state turned Republican because of the unpopularity of incumbent Jimmy Carter. National error is modeled to be the same for all states.

The regional error, which is the same for all states in a region, and the state error capture the correlation between states. We know for example that states in the same region tend to vote similarly (such as states in the South) and specific states in different regions also vote similarly like Washington and Massachusetts.

To predict the election with this model, we first compute the expected democratic vote share in each state based on the selected fundamental factors. Then we add in the national, regional, and state errors based on the historical accuracy of the model.

Each of the three errors is drawn from a separate probability distribution (we use a t distribution that has fatter tails than the normal) centered at zero with a variance set to the historical error of the model. We first draw the national error, which is the same for every state, and add this to the expected democratic vote share. Then we draw the four regional errors and add the appropriate one to each state and finally draw state errors, accounting for the historical correlation between states, adding these to the expected vote shares as well.

This gives us the simulated Democratic vote share in each state for the election and one minus this is the Republican vote share. We can then compute the number of states each party wins and therefore the number of electoral votes each party would get to determine the winner (ties or cases when both parties fail to get to 270 electoral votes are broken by randomly choosing one party as the winner).

Repeating this process a large number of times can provide an estimate for how likely the model thinks each party winning the election is. We therefore make probabilistic forecasts to account for historical uncertainty and estimation error.

Building a Presidential Forecasting Model — Hybridization

The fundamental and polls hybrid model combines the fundamental model described above with a forecast based on national and state polling.

For the polls portion of the hybrid model, we start with national and state polls in each Presidential election since 2000. We set up the polls to take the vote shares for the Democratic and Republican candidates only, ignoring third parties. We also ignore complications for states that don’t reward all their electoral votes to the overall state winner (specifically in Maine and Nebraska).

To get a prediction for each race we create a model to calculate a weighted average of all polls in a given state based on several factors:

  • Historical accuracy of the pollster
  • Historical bias towards a particular party (house effects)
  • Sample size in the poll (higher sample sizes get more weight)
  • Time till election day from when the poll was constructed (more recent polls get a higher weight)

We also adjust each poll using a linear time trend based on how polls nationally or in the state have trended over time. This allows us to act as if each poll was conducted on election day and ensure it takes into account expected trends in voter preferences based on more recent polling.

National polls and state polls are adjusted separately. For national polls, we convert the weighted average nationally to a state level using the historical relationship between a state’s democratic vote share and the national democratic vote share.

For each race we then calculate a final prediction for the Democratic and Republican vote shares by using a blend of the fundamentals based prediction, the implied state vote share based on national polls, and the final weighted average from all state polls. In cases where a state has no polls, we use the fundamentals based prediction blended with the state estimate implied by national polls. Where available state polls get higher weight because they have been more accurate historically.

As a last step, we turn this single prediction to a probabilistic forecast by adding random errors to each state’s forecast. We estimate these random errors based on the historical accuracy of the model’s forecasts and allow for correlation across states as well as for national swings in voter sentiment. Repeating the addition of these random errors to our main prediction a large number of times allows us to calculate the probability that each party will win the election and other statistics such as how likely each party is to win each state.

The Current Forecast — Democrats are Ahead

The fundamentals only model currently gives Democrats in 2020 a 71% chance of winning the election vs. 29% for Republicans. By contrast the fundamental and polls hybrid model suggests Democrats have slightly better odds at 80% vs. 20% for Republicans.

Contrast this with 2016 where Republicans were slightly favored to win in a fundamentals only forecast at 53% while a fundamental and polls hybrid suggested Democrats had better odds to win at 63%.

The fact that both the fundamentals only and hybrid models are lining up suggest that both economic conditions and voter sentiment as indicated by polls have shifted in Democrat’s favor.

Across simulations Biden is expected to get 329 electoral votes with an 80% confidence interval of 239 to 401 electoral votes.

In terms of the popular vote, Democrats stand a 91% chance of winning it. Across simulations the median popular vote margin is 6.5%, showing a clear swing towards Democrats relative to 2016 where Clinton led by anywhere between 2%-3% in polls on average. An 80% confidence interval on the popular vote is 0.01% to 13%, again showing that Democrats will almost certainly win the majority of votes.

Digging deeper and looking into the chance of an electoral vote popular vote split showcases both the asymmetric nature of the Electoral College today as well as Trump’s only real path to victory.

Across simulations, a clash is expected to happen around 11% of the time, but essentially in all of these cases it is Trump who manages to win the Electoral College while losing the popular vote. In fact in all of the simulations where Trump wins, 55% of the time he wins the Electoral College despite losing a majority of the votes, making Trump’s real hope to push for a repeat of 2016.

This singular fact explains Trump’s strategy of trying to polarize the electorate on identity issues, ignoring those outside his base, and focusing on getting just enough key states to cobble together a winning Electoral College cohort (it also raises the point that reform is needed to switch the presidential election system to a national popular vote and remove this structural defect in American politics).

Here are the predictions per state in both models along with the margins from polls (based on Democrat minus Republican vote shares) in 2020 vs. 2016:

This table contains a lot of information, but the important column is polls margin diff. This column represents the poll margin in 2020 minus the poll margin in 2016. Most of the states show a positive and blue color, illustrating a general, national swing towards Biden in 2020.

In almost all cases Biden is polling better than Clinton was at this time. Even in the three states that our hybrid model missed in 2016: Wisconsin, Michigan, and Pennsylvania (and which ostensibly cost Clinton the election), polls in 2020 show a larger swing towards Biden at between 2%-4%.

All this illustrates why Biden is in a stronger position in our model and has a higher chance of winning the election than Clinton did in 2016. Biden is currently favored in states like North Carolina, Iowa, Arizona, and Florida, which Trump won in 2016. Ohio is also in play with a 42% win probability for Biden. These are the states to watch in 2020.

Focusing on the tipping point state, which is the state that gives Biden 270 or more electoral votes when sorted by vote margin, it is expected to be Pennsylvania. Again, this rustbelt state will play an essential role in the Presidential election.

Trump’s Path to Victory

Biden’s path to winning the Presidency and his key states are clear. But what is Trump’s path to victory? Again, we should note that Trump is not really in a strong electoral position overall. He trails in many of his key states that he won in 2016 and essentially his best chance of winning is to cobble together enough of his base’s turnout to win the Electoral College while losing the Popular vote. In the simulations where he does win the election, Trump wins these states most often:

The map above is color coded from red to blue in each state based on Trump’s win probability for the state across simulations in which he wins the election. Solid red states are states he wins often while solid blue states are those he typically loses. Purple states are those on the margin and therefore are key to Trump’s chances of victory.

These states appear to be Florida, Arizona, Ohio, Iowa, North Carolina, and Pennsylvania, which is not surprising as they also expected to be the closest states based on current polling. We can get a sense of how important each of these states is by assuming Trump loses the state and then looking at his win probability in those simulations only.

We find that Trump essentially cannot win the election without holding Ohio or Florida. He stands a low chance of winning if he loses one of Pennsylvania, Arizona, North Carolina, or Iowa. But overall the odds are not great without winning all those states. Losing any of those states reduces his chances of winning to just 2%-3%.

The reason that Ohio and Florida are so crucial is that Trump currently has a narrow lead in those states. Losing them would suggest a swing toward Biden that would cut into his perilous electoral position even more.

Basically, for Trump to stand a real chance in the election, he has to hold all six of those states. The electoral math is not kind if he loses even a single one.

All in all, Biden has to like his chances, but no one should take anything for granted. At the end of the day voter turnout determines elections, so everyone should make sure to vote and that their voices are heard. Only then will the will of the voters be accurately reflected in the election results.

--

--

Vinod Bakthavachalam
Vinod B

I am interested in politics, economics, & policy. I work as a data scientist and am passionate about using technology to solve structural economic problems.