Good for What? Humans vs. Computers

Applying lessons from intro programming to political tech

Jackie Cohen
DNC Tech Team
6 min readMay 28, 2020

--

Before I was a software engineer on the DNC Tech team, I was a faculty member at the University of Michigan, where, among other things, I taught introductory computer programming courses. In those courses, we spent a lot of time asking and answering:

What are computers good at? And what aren’t they good at?

While this line of questioning may sound silly, it’s key to learning about computer programming. It is also key to making decisions in political technology.

An example from political technology

One of my favorite examples of this comes from our work at DNC Tech building engineering processes to manage and store data in the national voter file (read more about the national voter file in this article).

Campaigns need to reach voters efficiently and effectively. That means that the data they use to answer questions and to reach voters needs to be as parsimonious and accurate as possible.

Our team is focused on delivering the best data quality we can to organizers and campaigns, in order to elect Democrats everywhere.

It turns out that managing voter data over time across all states is a challenging problem — one that requires a lot of specificity, a lot of subject-matter expertise, and human beings.

States have wildly different city distributions, voter registration laws, voter statuses, recent special or municipal elections, and more, each of which contributes to the challenge of ensuring that changes to state data over time represent the reality of voter registration there.

We need to know if the data we’ve processed reflects reality well enough to save it in the database that will drive campaign analysis, reporting, and modeling. And we need to make sure that the data is in the correct standardized format to be useful in an enormous database that is behind tools like Blueprint.

Ensuring data quality

The DNC Tech team receives voter file data from states in many forms — just like every state is different, every state’s data is different. Before storing new data in the voter file database, we run a series of processes to format and standardize the data, as well as a series of quality control checks.

Concerns sometimes arise while processing the data. Perhaps there are a suspicious amount of voters missing from the voter rolls in certain counties, or perhaps there are a huge number of district changes. Sometimes those concerns can be reconciled with reality (e.g. there was just a huge redistricting case passed in that state, so yes, it is true that there are many new districts).

Sometimes they can’t be (e.g. when voter file data quality checks in our process found almost 175,000 voters purged from the rolls in Kentucky). To determine whether or not an oddity in the data is a genuine concern, it needs to be discussed or raised in the correct place before we can go ahead and store said data in our database.

For some of our data standardization and storage process, computers alone are perfectly adequate. For other parts, they’re not enough.

Voter file quality checks helped detect purged voters in KY in advance of the 2019 election

Selecting what parts of our process should be software alone, and in what cases software should simply make things easier for humans, is just as important a part of the engineering pipeline as the scripts themselves.

Developing our pipeline

When our team built an engineering pipeline to handle the ingestion of voter file data, our team focused on the same line of questioning we used in my programming courses, asking questions like:

  • What are computers good at?
  • What are computers bad at (especially compared to human beings)?
  • What can my software do?
  • What should I leave to a human user of the software I build?

These questions are just as important in the development of mature engineering systems as they are when beginning to learn about computer programming. But nowhere have they been as central for me as while working in political tech.

There’s been a manual way of ingesting voter file data for a long time; the voter file is not new. In building a new pipeline, we relied upon the questions described above, and we needed that kind of questioning in order to do it right.

Ultimately, standardization steps were chained together, the automatic process halting only when there’s a problem such that it required more specific engineering work.

We built a system for the data quality review that allows more humans, with more varied expertise, to safely make judgments about whether or not the data accurately represents reality.

Now, humans get pinged to make judgments that humans need to make, and computers provide tools to help us safely make those judgments. This improved system is part of what led to the discovery that voters in Kentucky had illegally been placed on an “inactive” list of voters just before a big gubernatorial election.

What’s “good enough” is complicated

Data quality is where the major difference comes in between a computer’s and a human’s judgment of whether data we might store is “good enough.” We can do great engineering work, but we shouldn’t assume that unit tests written by talented people are always right, if they’re testing something that a computer can’t test accurately.

By grounding our technology work at the DNC in the major differences between human systems and technology systems, our work is reliable, more robust, and more accessible to new members of our team.

Why should a human decide whether the data is “good enough” to store permanently? Computers are good at comparing data points, and at going step-by-step. Computers are great at surfacing data, comparing data, matching patterns.

But there are other equally important (and, thanks to our engineering work, much shorter!) parts of the voter file data loading process that software isn’t the right tool for.

Building on the basics

As writers of software, we have to decide what to rely on software for, much like, as learners in an introductory programming course, students have to get used to relying on variables and thinking about their associated values. Why logic about x = 1, y = 2, z = x + y ? Why not just get to the point with z = 3 ?

Success with software relies upon the key idea that a computer will never be wrong about the sum of the values of x and y, no matter what they are, and that matters.

The same lesson is integral to our work on the DNC Tech team every day: computers do exactly what you tell them to do, but humans have to decide what to tell them.

As we look toward the future, we may expect certain changes in voter registration patterns, patterns of voting methods, or precinct changes, as a result of COVID-19 effects. To find a pattern in data that suggests whether or not changes reflected in it are real is a job perfectly suited for the software we’ve already built. But to decide what that means, and whether or not we need to build and/or interpret an automated test differently under certain circumstances — that will be a job for a person.

Building robust tech infrastructure for processes like this supports Democrats everywhere, and building great tech infrastructure is really a human-focused endeavor.

Humans know things about reality that computers cannot, and we can’t ignore that.

--

--

Jackie Cohen
DNC Tech Team

Software, teaching, simple tech that gets things done, New York City, etc.