Putting the “Data” in MEDSL

Because you can’t spell Election Data & Science Lab without it

Cameron Wimpy
MIT Election Lab
3 min readApr 18, 2018

--

Data is one of our main operations at the MIT Election Data and Science Lab — it’s in our name. Our goal is to be a reliable source for all types of data pertaining to elections in the U.S. This includes things you may expect, like election returns data, but over time we will also be publishing the 2016 version of the Elections Performance Index and other types of administrative data relating to elections.

Over the past year we have worked diligently to gather and publish high quality election returns data. So far, we’ve published House, Senate, and Presidential results at the constituency level from 1976–2016, precinct results for 2016, and recent election results from 2017 (including New Jersey, Virginia, and the special elections in Alabama). Eventually, we plan to go back to 1789 with the constituency returns in federal elections. From there we will work forward, always getting county and precinct results when we can. For example, we are already laying foundational tools for integrating shapefiles and precinct identifiers after the 2020 census.

We want our data to play well with future releases and with data produced by other sources. We are building each data release in a consistent format with corresponding codebooks. We are also working to include identifiers from the Census Bureau, FEC, ICSPSR, and others when we can. This allows users to cross-reference and merge our data with other extant sources for any type of analysis.

Data quality is important to us. We only use official sources like state-certified results or those archived by the Clerk of the House of Representatives. This means that gathering the data takes time, but we hope that extra effort is worth it when it comes to having confidence in the data we produce.

All of our data undergoes an ever-improving quality control process. We cannot promise to fix every mistake the first go round, but we are committing to listening to our users and making updates as errors are found.

There are many ways to host data. We’ve chosen to host our data on Havard’s Dataverse as it provides a permanent hosting location that will always be there when you need it. It also allows us to seamlessly integrate with the Data page on our website. Always check there for the most up-to-date versions of our published data.

We are not alone. We would be remiss to not mention others working on elections data. For our part we are working with teams at the University of Florida and OpenElections on our precinct data projects. There are other large scale data projects happening as well, including Dave Leip’s Atlas of U.S. Presidential Elections, The American Presidency Project, and Harvard’s Election Data Archive. We are not a replacement for any of these great projects or others that have already done so much work in this field. Rather, we aim to be a comprehensive resource for all types of election-related data moving forward.

Stay tuned for much more. We continue to push out updates for the data we have, and our team is hard at work preparing for 2018. We will also be publishing a series of posts on our new software tools and resources that make working with our data (and data from others) that much easier.

Cameron Wimpy is the Research Director of the MIT Election Data & Science Lab.

MEDSL is dedicated to applying scientific principles to how elections are studied and administered, with the aim of improving the democratic experience for all U.S. voters. Follow updates on Twitter and sign up for our quarterly newsletter.

--

--