Election Transparency, Part #2: Data Collection and Analysis

Elise Thomas
Data for Democracy
Published in
3 min readMar 28, 2017

In the wake of the 2016 Presidential election, one question dominated the headlines: how could so many predictions have gotten it wrong?

Possible explanations which have emerged since the election include that pollsters were simply not talking to the right people (i.e. sampling bias); that pollsters misjudged who was likely to vote and who was not; and the ‘shy Trumper’ theory, which holds that people who planned to vote for Trump were less likely to openly admit it when asked.

Over the past several months, as part of the wider Data for Democracy project, volunteer data scientists have been digging deep into the data to better understand the dynamics of the 2016 election. Since beginning work on the project, the team’s focus has also broadened to include wider political and historical questions about the electoral system. This has not been an easy task; simply acquiring the relevant data and putting it into a useable format has been a time-consuming and complex team effort.

“The process of data collection was eye opening,” says Chris, one of the Election Transparency team leaders. “The fact that each state provides their data in a different format, and some of these formats change over time, makes data collection an adventure. For example, California provides their data in Excel spreadsheets, while North Carolina’s data are in HTML tables. To normalize the data we had to read in each state, and sometimes each year, separately using slightly different processes.”

D4D volunteer Robert has been using the raw data collected by the team to create choropleths (a kind of data visualization which uses a map divided into regions and colored according to the data variable).

“Often, the hardest part of working with elections data is that different state level agencies across the country use different naming conventions and different structures for presenting results,” says Robert. “D4D has done a lot of work around making sure that the information for all the counties across the country use a standard format. I was able to refactor code I had used in a previous project and from there it was very simple — I was able to ‘plug in’ their data instead of the precinct level data I used previously, and out came the projections we were looking for.”

The outcome of the team’s efforts so far include state, county and district-level datasets on topics such as census and demographic data; voter registration; economic statistics, and election results stretching back over multiple election cycles. All of this data has been made freely available to the public.

As the project moves into the analysis phase, the team plans to focus on four key questions:

  • Which factors explain county-level Presidential election results in 2016?
  • How was the 2016 election different from recent (or maybe even not-so-recent) elections, taking socio-economic and demographic characteristics of the electorate into account?
  • What are the impacts of the way in which congressional and legislative districts are drawn?
  • What are the causes and implications of barriers to voter accessibility?

Although the process of analyzing the data has only just begun, the team has already unearthed some interesting discoveries about the 2016 election. Whilst not ready to publicly release their findings yet, team member Scott observes that “the data we have assembled, in some ways, challenge the conventional wisdom about what caused the surprising result in the election.”

In addition to elections themselves, voter accessibility will be another important focus for the project. “I’m very passionate about ensuring the right to vote remains free, fair, and without obstruction,” says Rachel, another of the team leads. “The United States lags far behind other nations when comparing turnout of all eligible voters, but shoots to the top of the list when looking at turnout amongst registered voters. Being part of Data for Democracy gives our team an excellent opportunity to figure out why that gap exists, and what steps might help to close it.”

To keep up with the Election Transparency’s team’s progress, and with the latest news on all of D4D’s progress, follow us on Medium. If you’d like to get involved as a volunteer, please contact team@datafordemocracy.org

--

--