Open Precinct Data: Schema 0.1 Proposal

Michal Migurski
PlanScore
Published in
4 min readDec 27, 2018

Here’s a preliminary exploration of a new data format for storing U.S. electoral precincts, a follow-up to Open Precinct Data (April 2018).

Imagine if you could easily correlate detailed voting results from OpenElections.net or state boards of elections with mapped polygons and census geography over time.

Open precinct data supports a variety of needs. It provides missing geography to other data projects and makes community additions and conversation possible via a git-style mechanism. It makes precincts visible on a map and supports current and future elections by collecting forward and backward in time, all with unambiguous references to specific precincts in time using stable, internally-assigned, opaque, unique identifiers for every individual geometry.

The schema here is based on the ecosystem around GTFS, the data standard for transit schedule data successfully pioneered by Portland TriMet and Google Maps over ten years ago. Some design goals include:

  • Ease of use by data scientists and political scientists, whose tools include Python or R notebooks, spreadsheet applications, or GIS software.
  • Accessibility to parsing software similar to Partridge (by Danny Whalen and Remix).
  • Compatibility with parallel electoral data projects such as Open Elections.

Instead of simply documenting a tabular schema, this post shows the data in-use to answer basic questions about electoral geography. We can get to feature documentation later but first we need to validate that the format addresses basic needs. The remainder of this post is excerpted from a Python Notebook.

The sample data here covers a small part of North Carolina. shapes.shp includes geometries for Congressional districts with 2012, 2014, and 2016 borders, Johnston and Alamance counties, and all their voting precincts. U.S. House candidates for each included district over three general elections are included. The data is also available in Google Docs for easier browsing.

Geometries for Congressional districts with 2012, 2014, and 2016 borders, Johnston and Alamance North Carolina counties, and all their voting precincts.

Loading Data

Open Precinct Data is stored in a zip file with six contained files:

  • elections.csv – one or more elections
  • districts.csv – electoral districts for each election
  • candidates.csv – candidate details such as political party and incumbency for each district
  • precincts.csv – voting precincts where candidate votes are tallied for each election
  • shapes.shp – geographic areas for precincts and districts
  • sources.csv – names and links for official sources of data

We start by loading data from each of these files into Pandas DataFrames, using GeoPandas for shapes:

One Precinct

Let’s look at a single precinct. The term precinct is used loosely in this project. A precinct is any geographic area where votes are counted. For example, absentee ballots for entire counties may be included in this list. Here, we select a single precinct covering Haw River in Alamance County.

  • PSID:1158849879 is a unique, opaque identifier for geographic shapes in shapes.shp
  • OPID:1360711279, OPID:1360711281, and OPID:1360711283 are unique, opaque identifiers for three elections in elections.csv
  • OPID:1360711289, OPID:1360711291, and OPID:1360711297 are unique, opaque identifiers for three districts in districts.csv
  • The precinct changes name over time, but it’s always the same geographic area
  • For the first two elections the precinct covers two districts, OPID:1360711289 and OPID:1360711291
  • In the final election held after North Carolina’s 2016 redistricting, the precinct covers just one district, OPID:016
  • Both OPID:1360711291 and OPID:1360711297 can be called “District 6” but each belongs to a different plan
  • Identifiers like OPID:nnn+ and PSID:nnn+ are completely opaque and internal to Open Precinct Data
  • Identifiers like FIPS:nnn+ are defined by the U.S. government and used by the Census

Connecting Precincts to Elections

Our sample data includes two counties and three elections. Let’s look at candidate incumbency for an election in one county: 2014 in Alamance County. We start by matching on elections.election_date and precincts.county_name, and using pandas.merge() to select a subset of precincts.

Party incumbency is useful when predicting election outcomes: candidates running for re-election often have a track record and name recognition which can help them in the polls. We perform another pandas.merge() to connect our Alamance 2014 precincts to incumbent candidates, this time with a left join.

The complete list of 38 precincts above has a mix of Republican and Democratic incumbents along with an empty seat. In 2014, three U.S. House districts overlapped with Alamance County.

Output to GIS

Now we can link the table above to geographic areas to see how this county election looks on a map. Both precincts and districts include a shape_idforeign key that we can find in shapes.shp. After merging, we use a geopandas.GeoSeries.intersection() to split multi-district precincts like Haw River (precinct 13) among their districts.

Finally, we can see precinct 13 in the map below, with its Northern portion in David Price’s Democratic District 4 and its Southern portion in the open-seat District 6. A small corner of the county falls in Renee Ellmers’s Republican District 2.

Conclusions

The example above is small and contrived.

Next steps might include putting the sample zip file through more scrutiny. Open Elections precinct-level results might be connected to the unique IDs used here. A larger sample file covering several elections in North Carolina would support a more consequential exercise. The data could be extended to cover more chamber elections that U.S. House of Representatives. Linking tables like candidates.csv might belong in another data project like Open Elections. Spatial data in shapes.shp could be stored in a different format such as GeoJSON or Geopackage.

Get in touch if this sounds interesting to you.

Thanks to Brian, Danny, Derek, Katie, Nelson, and Stephen for their feedback on early drafts of this post.

--

--

Michal Migurski
PlanScore

Oakland/SF Bay Area technology & open source GIS. @Remix and @PlanScore, previously at @mapzen, @codeforamerica, and @stamen. Frequently at @geobreakfast.