What is the voter file?

Understanding the foundation of political technology

Gautham Arumilli
DNC Tech Team
5 min readAug 31, 2020

--

Man holding a sign that says “Register early to vote”
Photo by annie bolin on Unsplash

Voter file data is the foundation of political technology. It is used by every data-driven campaign. Campaigns use the voter file to map out high-level strategies, build detailed campaign tactics, execute day-to-day programs, and report on the effectiveness of their efforts. As a result, the recency, accuracy, and comprehensiveness of the voter file is critical to the success of campaigns.

As the DNC Tech Team, we are the stewards responsible for maintaining a national voter file database for use by Democratic campaigns. This database is the mainstay of our data warehouse, and maintaining the voter file database is one of our most important responsibilities.

This post explores some of the work that our team does to ensure that this critical data stays up-to-date, accurate, and comprehensive as we move through the crunch of the campaign cycle.

Obtaining the voter file

Each state (and the District of Columbia) is required to maintain a list of all registered voters, and make them available (how they do so varies widely by state). In some states, only political parties or campaigns can access the file for election-related purposes. In others, the voter file data is public, and may be accessed by any person for all relevant uses.

The exact data provided for each voter on these lists also varies by state. However, voter files usually have at least a few core pieces of information for each voter:

  • Name and address
  • Registration status (e.g., active, inactive) and party affiliation
  • Precinct(s) and congressional, legislative, and other (e.g., school) districts
  • Vote history (i.e. which past elections a voter has participated in)

An important note is that while states provide information about the elections a voter has participated in, they do not say whom a voter voted for. A voter’s ballot is always private!

We work with the Democratic parties in each state to frequently and regularly update the voter file with the information available from the state.

Challenges of building a “national” voter file

The DNC Tech Team’s responsibility is to compile each of the state-level voter files into a comprehensive national voter file database that we then make available to campaigns.

In doing so, we face several challenges with the raw data:

  • Raw data is inconsistent from state to state: Voter files vary dramatically in format, data availability, and presentation from state to state.
  • Raw data is often incomplete for campaign needs: The data obtained from each state is raw data about registered voters, which may be insufficient for a campaign’s needs. For example, registered voters may move out of state without canceling their registrations — attempts to reach these voters may waste valuable campaign resources.
  • Raw data may have quirks and inconsistencies: These can include simple typos or input errors for individual voters, or bigger issues that cause large amounts of data to be missing or erroneous.

We tackle each of these potential issues in a methodical way during our voter file ingestion process.

Voter file ingestion process

Standardization

After we receive a new, “raw” state voter file, the first step in our ingestion process is to “standardize” the data. While each state provides voter registration data in a unique way, our national voter file database presents data across states using a consistent data model with the same fields for each voter. During the standardization step, we transform each state’s raw source data into a common data format that can be ingested into our database.

Appending data

After standardization, we “append” data to the state voter file to make it more useful for campaign needs. For example, we use the USPS National Change of Address (NCOA) database to help determine whether a voter may have moved since their most recent registration.

In addition, we use the National Record Linkage algorithm to de-duplicate voters across states who may have moved, and to better create continual histories of individual voters.

Finally, we invest in purchasing and modeling additional data to increase campaign efficiency and improve voter outreach.

Quality control

Before “publishing” a state voter file to the national database, we undertake a rigorous quality control process, with hundreds of detailed checks for consistency and accuracy. These checks help surface potential issues with the raw data, as well as our own standardization and append processes. Running these checks for every file helps ensure that campaigns are only using the most comprehensive and accurate data possible.

This set of checks also helps surface cases where a state may take actions that compromise voter rights and the integrity of the election process.

For example, last year, the Kentucky Board of Elections attempted to place nearly 175,000 voters on a separate “inactive” list ahead of a critical statewide gubernatorial election. Through the quality control checks that we ran on the Kentucky voter file, we were able to spot this problematic process, and the Kentucky Democratic Party was able to win a court decision that restored these voters to the main registration list ahead of the election!

During that election, Democrat Andy Beshear defeated incumbent Republican Matt Bevin by just 5,136 votes (out of 1.4 million votes cast) to become Kentucky’s governor, which helped protect the Medicaid coverage of hundreds of thousands of Kentuckians.

News article with the headline, “Judge sides with Kentucky Democratic Party in lawsuit against ‘inactive’ voter list
Through quality checks, we were able to identify abnormal voter purge activity and help get voters restored ahead of the election.

We also make this information available to voter protection teams so they can monitor for potential voter purges.

Connecting with voters

All this data is available to sister committees, state parties, and the presidential campaign, and is leveraged by thousands of campaigns during a typical election cycle to power voter outreach. In partnership with sister committees, state parties, and campaigns we are building valuable tools, models and other products, including our Blueprint dataset, in the Phoenix data warehouse.

Through these efforts, we are enabling victories by Democrats up and down the ballot this fall, and laying the groundwork for a durable infrastructure that will serve Democrats for years to come. Ultimately, this data is what powers the ultimate goal of campaigns: speaking directly to Americans about policies, candidates, and the power of their vote.

Interested in joining DNC Tech? Check out open roles here.

--

--