2024 Models Methodology

David @ acctuallydavid.com
8 min readJul 11, 2024

--

House forecast model on forecast launch (July 10th)

It’s that time of year again, the time for me to write a boring methodology statement. In all seriousness, the 2024 election cycle is coming up and I have produced three models; a President model, a Senate model, and a House model. All three of these models work the exact same way, and have the same exact methodology, although there are differences in the presidential model. For now, I will write about the individual models in each specific state and district, but the national models will be discussed at the end of the article.

Step 1: The Fundamentals

The very first thing that is done is the creation of a fundamentals-based model. I have always been a strong believer that models that focus exclusively on the polls are foolish, and a fundamentals based approach combined with a polling average is the best approach. Regardless of my own personal beliefs, the fundamentals model is also built in steps.

  • A variable representing the national environment is created.
  • This variable is based off of seven economic factors (jobs, personal income, consumption, production, inflation, GDP and stocks.) These economic factors are then applied to a linear regression based off of the results of the last 16 presidential elections as the Y axis and the yearly growth data for all of those factors as the X axis to determine the economic index.
  • Additionally, this variable is averaged with the polling average of the generic ballot (75%) and the presidential election (25%) to create the national environment. The weight of the economic index in the average will slowly decrease and will be very low by the time of the election.

For the presidential model, the polls of the generic ballot are not used, and only the polls of the presidential election are used.

  • Next, each district is assigned a partisan lean. This partisan lean is based off of the results of the 2020 (80%) and 2016 (20%) presidential elections relative to the national popular vote in those elections, as well as the trend (defined by the 2016–2020 trend (80%) and the 2012–2020 trend divided by 2 (20%)) in the district.

For the house and senate models, the presidential forecast in each district is also considered in the partisan lean of the district. The presidential forecast in each district is calculated in house districts by subtracting the presidential forecast in the district’s state by the district’s result relative to the district’s state in 2020.

Additionally, the results of the California and Washington primaries are considered in California and Washington house districts. The results of those primaries are adjusted for historical swings.

  • Incumbency values are also calculated for districts with incumbents.
  • Incumbency values are calculated using data from the 2022 elections for the house and the 2018 elections for the senate. They are calculated using a linear regression of election results as the Y axis and a collection of factors (election results, the national environment, fundraising data, racial data, and controls for incumbency) as the X axis. This is similar to (and is inspired by) Split Ticket’s WAR models, but it is not the same and does not use the same datasets. Once the incumbency values are calculated, a bonus of 2 points is added to adjust for the aforementioned controls for incumbency.

The 2020 presidential election is used for House incumbency scores, the 2016 presidential election is used for both House and Senate incumbency scores, and the 2012 presidential election is used for Senate incumbency scores.

  • Next, fundraising data is collected from OpenSecrets and is converted into a two-party margin. This is the least complicated part of the model; the democratic and republican fundraising data is simply plugged into a logarithm.

Outside spending data is considered (spending+outside spending.) Additionally, for presidential data, the fundraising data is based on the amount of money the two candidates have raised in each state, and outside spending data is calculated for each state based off of the percent of money raised in each state. For the congressional districts with electoral votes, the state data is simply divided by two in Maine and by three in Nebraska to approximate district-level data.

  • After all of those things are calculated, the national environment, the partisan lean, the incumbency scores, and the fundraising margins are simply added up. Once the data is added up, relatively non-influential demographic adjustments are applied using a linear regression of racial data, urbanization data, and education data to create a fundamentals-based model. It’s not a probabilistic model and is only a margin, but I am unburdened by what has been.

Step 2: The Polls

All polling data is gathered from FiveThirtyEight.

Despite my prior (true) statement that models that focus exclusively on the polls are foolish, this model is still by-and-large a polling model, and polls have more weight than fundamentals. Polling data is calculated using a weighted average. The weights in the polling average are calculated using several factors;

  • If a pollster has a higher rating, their polls will have a higher weight and vice-versa.
  • If the poll is closer to election day, it will have a higher weight and vice-versa.
  • If the poll has a higher sample-size, it will have a higher weight and vice-versa.
  • If the poll has a higher ranked population (ranked from highest to least; likely voters, registered voters, all voters, and adults,) it will have a higher weight and vice-versa.
  • If the poll is an internal poll, it will have a lower weight and 2% will be subtracted from the candidate who’s party released the poll (and 2% will be added to their opponent.)
  • If the poll is a tracking poll, it will have a lower weight.
  • Additionally, if a pollster has released multiple polls, their older polls will have significantly less weight (up to being practically dropped from the average) compared to their most recent poll.

Step 3: Averaging Everything Together

This step is the final step for determining the district-level models, and is self-explanatory; Step 1 and Step 2 are averaged together. If there are no polls in a district, the fundamentals will simply be the final margin in the district. However, if there are polls;

  • The polling average in a district will be given a weight based off of simply adding up the weight of all polls in the polling average.
  • The fundamentals in a district will be simply given a flat weight that is equivalent to five high-quality polls.

For districts in California and Washington, the weight of the fundamentals is increased to a weight that is equivalent of six high-quality polls, since those states have primary results of predictive value.

  • Once the weight of the polls and the fundamentals are determined, they are averaged together to create the final margin in a district!

However, that is not all. While the margin in the districts has been calculated, probabilities must be derived from the districts. To calculate this, a normal distribution is used.

  • A standard deviation is calculated by subtracting the amount of polls together from a value that progressively shrinks the closer it gets to the election. As a summary, the more polls there are and the closer it gets to the election, the more certain the model is of itself.
  • This standard deviation is simply plugged into a normal distribution (the mean of the normal distribution is simply 0.)
  • Once the normal distribution is created, the democratic probability in each district is calculated by applying the aforementioned two-party margin as a cumulative distribution function to the standard deviation. The republican probability in each district is calculated by subtracting the democratic probability from 1.

Step 4?: National Simulations

That’s right! You thought there would be only 3 steps for this model, but there has to be a 4th step. While I’ve touched on the model runs in the individual districts, there still has to be national simulations for seat counts, popular votes, and the control of each chamber. Everything is relatively straight-forward;

  • 10,000 unique variables are generated, each variable being a random national environment, with the range of possibilities for each variable ranging from D+20 to R+20.
  • Next, each variable is applied to the district level model runs, and the result of each simulations are tallied up.
  • For example, to calculate the probability of the democrats winning the senate, all that needs to be done is dividing the amount of times the democrats win the senate in all simulations by the amount of simulations (20,000.) It’s that simple! That same general principle applies to the amount of democratic seats in the senate, which can be calculated by the average of democratic seats in every simulation.

Epilogue

Thanks for reading about the methodology behind my models! I really appreciate anybody that takes the time to review and read about my work. To clarify some nomenclature before I start crediting people, this model is not a Bayesian model; while it might seem like one, this model (and all of my previous models) are much closer to a numerical model like those found in Race to the White House and Split Ticket.

If you ever find an issue with the models, whether that is a bug on the web front-end or a methodological issue, don’t be afraid to contact me. I can always be found on my Twitter (at davidsacc12345,) or any of the other social media pages linked on my website.

Anyway, I don’t know an exact number, but I’ve spent over 100 hours on the models for 2024 alone, and hundreds more on the models for 2023 and 2022. My models for 2024 are my most comprehensive models yet; while my 2023 and 2022 models were both individually well under 1,000 lines of code, my 2024 models are almost 1,500 lines of code, not even counting the wealth of additional data that is used for my 2024 models. Additionally, the web aspect of my 2024 models is way more refined; while my 2022 models had less than 10 web files, my 2024 models have more than 2,000 (!!!).

I would feel bad if I didn’t give credit where credit’s due. All polling data is collected from FiveThirtyEight; all fundraising data is collected from OpenSecrets; election and demographic data are collected from Redistricter and Dave’s Redistricting; website maps were originally downloaded from Wikimedia Commons and Daily Kos; and additional election data is provided from Wikipedia, as well as California and Washington Secretaries of State Shirley Weber and Steve Hobbs.

I wouldn’t have been able to keep creating things without any of the support provided to me by people that support me, and if you are linked here, thank you (in no particular order.) It really means a lot to me.

Thanks again for reading this long article!

--

--