How It’s Made: A PlanScore Predictive Model for Partisan Elections

With the mid-term elections in full swing and campaigns focused on the districts we have, PlanScore has had a quiet few months. Earlier in the year Pennsylvania redrew its U.S. House districts with a rush of competing plans submitted by participants from all over the state’s political landscape. PlanScore’s Nicholas Stephanopoulos reviewed them all using our models and described the process on the Election Law Blog. Last month, the Virginia General Assembly had until October 30 to pass a remedial map after a district court in Virginia struck down eleven House of Delegates districts on racial gerrymandering grounds. We scored those too, and Nick wrote about the process again.

After the 2018 mid-terms, we’ll be headed into a three-year Census and redistricting cycle redistricting cycle where all 50 states get new maps just like Pennsylvania and Virginia did in their special processes. In this post, I’ll describe how we make PlanScore’s prediction models to support that cycle and help ensure fair maps for every state.

It’s a simple process once you see all the steps

PlanScore models convert past election data into future predictions. We can build models based on past vote totals and demographics to predict what parties will have an advantage with completely new district boundaries because today’s voters consistently vote under a stable party identity. Supreme Court Justice Elena Kagan pointed out the equivalency of redistricting with data and evaluating boundaries in the Gill v. Whitford oral arguments earlier this year:

And it seems to me that, just as legislatures do that, in order to entrench majorities — or minorities, as the case may be — in order to entrench a party in power, so, too, those same techniques, which have become extremely sophisticated, can be used to evaluate what they’re doing.

We score new maps with metrics like the efficiency gap, mean-median difference, and partisan bias to see if they’re fair under a variety of potential electoral outcomes. Gerrymandering is often understood to mean funny-looking shapes on maps but with today’s big data and fast computers even a superficially normal-looking map can hide durable lopsided advantages for one political party. Map-drawers have always looked at past elections to predict future outcomes, such as the Republican Party’s 2010 REDMAP project to increase Republican control of Congressional seats.

For the next three years, the country will be preparing for a new U.S. Census and nationwide redistricting based on the results. Almost every district map will be redrawn by 2021 by a combination of commissions and political parties. Here’s how we build models to help policymakers, litigators, and members of the public transparently score the fairness of new plans:

  1. Collect precinct-level election vote counts and geographic boundaries
  2. Use past vote outcomes to simulate plausible future elections
  3. Store results as vector tiles to score new districts in under a minute

Collecting Precinct Data

Every model starts with raw election results at the precinct geographic level, compact geographic areas comparable in population size to zip codes or census tracts. Precinct boundaries can be changed at any time for any reason by each county. It’s a challenge to match the names of precincts in Open Elections results with identifiers in geographic files. Sometimes they match exactly. Often they need to be individually inspected and matched to those used in the specific election. For our Virginia 2017 House of Delegates results, we were able to find mostly-matching precinct shapes for the 2016 General election and then scoured the web for a handful of counties with changes from the intervening year. I’ve written before about the special difficulty and opportunity for collecting precinct data; this part of the PlanScore process is still enormously time-consuming and often artisinally opportunistic. The Voting Rights Data Institute at Tufts and MIT spent six intense weeks building up Ohio precincts from PDFs and printed maps, and that’s just one difficult state. When we’ve matched results to shapes for an entire state, we assign unique numeric IDs (Brooklyn Integers) to each precinct to make them easier to follow through the modeling process. Knowing the geographies allows us to attach selected Census demographics to each precinct since these are often useful in making predictions.

With geographies in hand, we move on to vote counts from past elections. For example, our Virginia House of Delegates model is based on the November 2017 general election with a race for Governor that let us compare voting patterns across the state. The most granular vote totals are available at the level of local precincts or wards, reported to county boards of elections and ultimately secretaries of state. For an individual precinct we can’t know exactly where the voters are located, but we can know a precise number of votes for each candidate on the ballot. Precinct-level election results are often available from state-level source. To save work on adapting state data, we can usually use results collected by Open Elections: the first free, comprehensive, standardized, linked set of election data for the United States.

Virginia precincts in effect for the November 2017 general election.

1,000 Simulated Elections

At this point, political scientist and statistician Eric McGhee takes over. Eric generates predicted votes using demographic and political variables entered into an ordinary least squares regression model:

To predict turnout we regress total major-party vote for the race in question on total major-party presidential vote. To predict vote share we regress the Democratic share of the major-party vote on the Democratic share of the major-party presidential vote and the white share of the voting-age population. Using the coefficients and standard errors from these models, we then generate 1,000 simulated total votes and Democratic vote shares for each precinct. These numbers are the inputs for calculating 1,000 sets of efficiency gaps, partisan biases, and mean-median differences, which produce the means and margins of error reported on the site.

Our predictions so far have been “open seat”, meaning that they have a wide margin of error to account for a candidate of either party with no established track record. In a real election familiar candidates run for re-election, so we’re upgrading our model to account for incumbency effects. Predictions will vary depending on each seat’s incumbent party affiliation to mirror the slight preference that voters often show for a familiar name.

After generating predictions, the time-consuming work of PlanScore model-building is over. We’ve collected precinct-level vote counts, researched and mapped precinct boundaries active at the time the votes were cast, and prepared 1,000 simulated elections results. Simulation results and demographics such as Citizen Voting-Age Population are turned into large geospatial layers and cut into map tiles.

Tile outlines in PlanScore’s Virginia model. Smaller tiles are centered on denser, urban areas.

Scoring a District Plan in a Minute

We use tiles to parallelize the work of scoring a new district plan. A large state like Virginia or Pennsylvania can be broken up into many hundreds of square map tiles, each with a portion of the model’s precinct geographies. The tiles we use are variably-sized to account for the changes in density between rural and urban areas of each state. Urban cores are stored in many dense tiny tiles, and each tile is pushed to Amazon S3 as a compressed GeoJSON feature collection. Models are immutable: when we add data or re-run simulations the new data goes into a new subdirectory.

Individual tiles contain full-resolution excerpts of precinct geometry such as the two examples in the illustration below. One example zoom-10 tile from the sparse western part of the state covering George Washington National Forest is over 30km wide, while the other zoom-13 tile from dense, urban central Richmond is just 4km wide.

10/285/394 and 13/2333/3172: two sample tiles showing similar complexity at different scales.

When a visitor to PlanScore uploads a geographic boundary files representing districts, the tiled model allows us to calculate partisan asymmetry scores in under a minute through the magic of AWS Lambda parallelism. Lambda is Amazon’s functions-as-a-service and we invoke it separately for each tile in the model. We can chew through a large state rapidly by running each tile in parallel and combining the results at the other end. PlanScore sums up vote totals for each district from each tile, and writes the results into a static summary file on AWS S3, the simple storage service.

Highly symmetric Democratic proposal for new Virginia House of Delegates districts

Preparing for 2020

With models in place we can rapidly respond to the rush of plans that typically accompany a redistricting effort. Pennsylvania’s process earlier in the year featured seven major competing plans. Virginia’s Division of Legislative Services lists dozens of potential redistricting plans for Congressional, House, and Senate boundaries. PlanScore has experimented with complete or partial models for key states with ongoing redistricting debates North Carolina, Pennsylvania, Wisconsin, Virginia, and Maryland. Starting next year, all 50 states will begin the process of redistricting for 2020. Some states like Utah and Missouri are asking voters to endorse the creation of redistricting commissions to make the outcomes more equitable. We hope to be ready with precinct-level data and predictive models to make this process fair and representative for everyone.


Thanks to Nelson, Hannah, William, Nicholas, and Eric for your feedback!