Open Precinct Data

Several weeks ago, I spent an extended weekend at the fifth (of five) Geometry in Redistricting conference. Apart from speaking and participating in a panel on law, tech, and gerrymandering, organizer Moon Duchin asked for my help organizing the conference hackathon. One theme I heard repeated throughout the event centered on the difficulty of finding reliable precinct geography and election results.

Precinct shapes in Wisconsin, covering Madison (center) and Milwaukee (right)

There’s an opportunity here for a new data project focused on connecting existing academic and independent efforts with durable, unique, permanent identifiers for nationwide voting precincts. Imagine if you could easily correlate detailed voting results from OpenElections.net (OE) or state boards of elections with mapped polygons and census geography over time. We already know how effective a GEOID-based approach can be thanks to data published by the U.S. Census, but precincts are a special challenge without a current champion.

What are precincts good for?

  • At PlanScore.org, we’re creating models for new district plans and we need vote totals at high spatial granularity to predict how fair a user’s proposed district plan would be in future elections.
  • Academics, researchers, and journalists need to relate voting patterns to census data for a better understanding of the political preferences of various demographics.
  • When I volunteered at a political campaign in 2012 we used precinct data to organize canvassing efforts for field operations. A precinct tells you the candidates on a voter’s ballot when you knock on their door so you can deliver a consistent message.
  • The Princeton Gerrymandering Project and Duke Quantifying Gerrymandering Project are analyzing partisan characteristics of various redistricting plans. Assessing the range of electoral results associated with a map requires specifying voting patterns down to the precinct level, and you can only do this if you have precinct geography.

These are just a few uses for precinct-level data. The data is a hot mess, and in need of an organizing effort. It needs a home. Right now, geographic data can be gotten piecemeal from a variety of sources but rarely from state-level authorities who should be collecting and publishing it. Instead, users must know about resources like Harvard’s Election Data Archive (up to 2011) or the ongoing Election Geodata repository that Nathaniel V. Kelso and I maintain. For key newsworthy states like Pennsylvania, it’s a bad sign that both the Washington Post and New York Times cite our volunteer Github repository instead of an official government source.

Existing Precinct Data

What about VTDs? U.S. Census conducted a nationwide collection of Vote Tabulation Districts after 2010. These are easy to confuse with precincts, but a group of Geometry in Redistricting hackathon participants from Duke University, Pennsylvania, and elsewhere showed that VTD and precinct data in North Carolina are not the same. Some precincts cover multiple VTDs, while others don’t match at all.

What about the OpenElections project? OE collects precinct-level vote totals for elections nationwide. Data from returns is collected from counties and states but it doesn’t include geographic boundaries. OE is an ambitious project led by journalists that nicely handles results suitable for election night reporting. Precinct geography data should connect with OE wherever possible, but currently this is a messy and manual process.

What about Voting Information Project (VIP)? In 2012, Google-supported VIP published precinct descriptions for many states. These came in XML format as lists and ranges of addresses I spent a week of quality time connecting them to U.S. Census TIGER data with some success. VIP no longer appears to publish data in this form.

What about data from state election officials, like secretaries of state? A few states proactively publish correct precinct geography linked to specific elections, but most don’t. Precinct areas are often a county-level concern, created and maintained to support local election operations without rolling up to a statewide dataset. Pennsylvania and Maryland don’t offer consistent statewide precinct geography, and datasets for these states must be collected via telephone and 1:1 inquiries. The delivered datasets don’t always match election results, and must be carefully inspected. Counties change precincts continuously (we’re not sure why), so data collected in long after an election may or may not match the precincts in effect during voting.

An Opportunity For A Project

I have some experience with this type of large-scale spatial data project. At OpenAddresses.io we’ve been collecting and organizing worldwide address data for four years, a similar scale of effort to collecting nationwide precinct polygons. At Mapzen, Aaron Cope’s Who’s On First place gazetteer took inspiration from Yahoo!’s Where On Earth IDs to center on the provision of unique and immutable numeric identifiers. A precinct project should address these needs:

  1. Support unambiguous references to specific precincts in time using stable, internally-assigned, opaque, unique identifiers for every individual geometry
  2. Provide missing geography to other data projects, like OpenElections
  3. Make community additions and conversation possible via a git-style mechanism, like OpenAddresses
  4. Make precincts visible on a map so users can see what’s available without GIS
  5. Support current and future elections by collecting forward and backward in time

This won’t solve the missing “.gov” problem, but with some coordination between universities and independent projects like PlanScore or OpenElections, we should be able to arrive at a mutually-beneficial hub for precinct data that addresses 80% of everyone’s needs and provides a critical backbone for election data research leading up to the 2020 redistricting cycle.

What next? Get in touch if this sounds interesting to you. We’re already building pieces of this puzzle at PlanScore.org to meet our own needs. It’s unproductive to work on a potentially-shared effort in isolation. Let’s share the load!

Thanks to Anne, Deborah, Derek, John, Michael, Nathaniel, Tom, and William for their feedback and encouragement on early drafts of this post.