How Data Journalism Failed the American Electorate in 2016

Data journalism was supposed to be the answer to the anecdotal coverage that has dominated political races of the past. We’re all familiar with that type of coverage. It begins by describing the rally, or small town, or factory, that the reporter visited, often with a political candidate. Next it describes an interview with someone in the setting. The person is frustrated, or hopeful, or whatever emotion fits best with the tenor of the story. Finally, it uses a few of these interviews to describe the state of the election overall, broadly generalizing in often inaccurate ways.

Data journalism was supposed to fix that by giving a broader view of the election and the electorate. We don’t need to here from one Jill and one Joe; instead, we can report on an aggregate of all the polls to capture all the Jills, Joes, and everyone else too.

This was Nate Silver’s stated goal when he moved his 538 project over to ESPN to launch an entire site dedicated to data journalism. From his founding essay: “I would never have launched FiveThirtyEight in 2008, and I would not have chosen to broaden its coverage so extensively now, unless I thought there were some need for [data journalism] in the marketplace. Conventional news organizations on the whole are lacking in data journalism skills, in my view. […] Narrative accounts of individual news events can be informative and pleasurable to read, and they can have a lot of intrinsic value whether or not they reveal some larger truth. But it can be extraordinarily hard to make generalizations about news events unless you stop to classify their most essential details according to some numbering or ordering system, turning anecdote into data.”

FiveThirtyEight, run by Nate Silver, made a name for itself via political forecasting. Nate Silver then sought to expand the site into a data journalism outlet. Now, after getting the 2016 election spectacularly wrong, it is unclear what’s next for the site and brand.

This is the problem data journalism can solve: It can help us find the “real” story more quickly and in turn make reporting more detailed and informative for the general public.

That brings us to November 8, 2016. Organizations attempting to model the presidential election on the basis of poll results and other factors had the race anywhere from “leaning democratic” to “>99% democratic.” No reputable use of publicly available data predicted a Trump win. And all were wrong.

The more intelligent sites refused to offer a prediction like the Princeton Election Consortum, which offered the “>99%” number. 538 offered a 29% chance of a Trump victory, making it easy for them to claim they left a path open for Trump all along. As such, it is fruitless to suggest data journalism failed because it didn’t use available data correctly. We have no reason to believe 538 would have gotten their predictions correct if they had tweaked their model in some way.

Instead, we must look at how data journalism shapes the election coverage. Prior to 2016, it was easy to argue that data journalism was a better way to cover politics beyond the traditional “horserace” coverage. You know the tone: “Neck and neck down the last stretch! It’s Trump on the outside making a move to pass. Will he have enough to overtake Clinton?” It’s the breathless, don’t look away, style of coverage.

Horserace coverage is reporting that covers an election by focusing on the candidates’ position in the polls. Data journalism in 2016 mostly served to offer more advanced metrics of the candidates’ position in the polls, rather than offering new and important information to voters. Picture from, which is a real site.

How does adding data journalism affect this coverage? The result was sites like the New York Times putting their own forecast on their homepage and with every political article. “Today, Hillary Clinton has an 85% chance of winning.” It was as if any reporting was meant to be viewed through the lens of the polls in aggregate.

In short, thus, data journalism wasn’t using numbers and statistics to offer greater insight into the race. Instead, it was like calling the horserace but with advanced metrics on the horses. Instead of describing the horses in terms of heart and spirit, they were described in terms of maximum oxygen conversion and stride length ratios. It was the same old horserace, but with a fancier vocabulary.

Does that make for better horserace coverage? Perhaps so. We don’t have to look far back to see horserace coverage that ignored data journalism. In 2012, as Nate Silver reminds us in his essay cited above, many pundits were throwing up their hands and saying the race was too close to call on the basis of shoddy, anecdotal evidence: “Peggy Noonan, the Wall Street Journal columnist, wrote a blog post on the eve of the 2012 election that critiqued those of us who were ‘too busy looking at data on paper instead of what’s in front of us.’ Instead, ‘all the vibrations’ were right for a Romney victory, she wrote. Among other things, Noonan cited the number of Romney yard signs, and the number of people at his rallies, as evidence that he was bound to win.”

Peggy Noonan, columnist for the Wall Street Journal, embodies a style of punditry that data journalism proponents sought to end. Rather than focusing on stories and gut feelings, data journalism was supposed to bring data and projections into news coverage.

It’s not hard to see, at the time or in retrospect, that those claims are likely unrelated to a candidate’s success, especially when the evidence is selectively gathered by one person.

So it isn’t that data journalism is bad for horserace coverage. Instead, it’s that data journalism is NOT good for political coverage. Data journalism provides new ways to write about the same subjects (the horserace), which leads to a proliferation of coverage of the horserace. 538’s model was designed to be reactive, changing (even if just a little) each time new data was entered. With this new info, and all the other models available, reporters could do horserace coverage in more nuanced ways.

Horserace coverage won, and political coverage lost because of the lost opportunities to do issue-based reporting. When an editor assigns a journalist to write horserace coverage, that journalist is not available to write about something else.

In this election, we found ourselves without coverage of actual issues. The Republican and Democratic parties enacted party platforms that were far to the right and left, respectively. Yet there was little reporting of these platforms. The contrasts between the two candidates’ substance was dramatic during the debates, yet little was written about what they were actually proposing.

Imagine if editors had scraped horserace coverage altogether. Imagine if they published few articles about poll averages. Instead, imagine they instructed reporters to cover issues and to educate the public. For example, a reporter could be instructed to cover questions of trade. What kinds of trade policies would each candidate prefer? How do they propose enacting those policies? What should change and what should stay the same? And what are the implications of these proposed changes? There was plenty of information about these questions. A reporter could likely file at least one story a week on the topic. The public would be educated, and little would be lost by having that reporter work on an issue rather than the horserace.

Issue coverage does not mean data journalism is out. Nate Silver, in his aforementioned essay, points this out: “[Data in journalism isn’t just about prediction. Instead, it] usually involve[s] more preliminary steps in the data journalism process: collecting data, organizing data, exploring data for meaningful relationships, and so forth. Data journalists have the potential to add value in each of these ways, just as other types of journalists can add value by gathering evidence and writing stories.”

But the evidence we have from this presidential election suggests that data journalism ISN’T being used in the ways Silver describes it. Instead, it is used to provide advanced metrics for the horserace coverage. Enhanced horserace coverage continues to ill-serve the public. And in an election where the issues were reported on infrequently and a candidate was able to win based on personality rather than policy, we need to scrutinize all aspects of journalism. It’s time for a return to principle. The news media must restore its vital role in American democracy by rejecting enhanced horserace coverage and focusing instead on the issues that affect our country and world and the positions our candidates take on those issues. If data journalism can help with that, then I’m all for it. The only problem is that we lack the data to suggest it will be used for anything other than horserace coverage.

538’s final model for 2016. They predicted multiple states incorrectly, yet their data journalism approach provided endless fodder for more detailed horserace coverage. This may have resulted in a lack of attention to issues, policy, and candidate positions, a disservice to the public and a fundamental violation of the news media’s role in democracy.