The U.S. Election: What We Got Wrong, What We Learned
Last Friday, we argued that “the digital signals suggest that Hillary Clinton is increasingly likely to be the next president of the United States,” and we set out three charts to explain why Donald Trump couldn’t “break through” in time for Election Day. We were wrong. On Sunday, we went further and listed seven reasons why Clinton looked more and more likely to win the election. We were wrong. On Monday night, hours before the polls opened, we concluded that “if Predata’s reading of the polls, the early voter data, and — most importantly — the unyielding dominance of the Democratic candidate in the digital domain over the past three weeks is correct, Hillary Clinton will be the next president of the United States.” On that, too, we were wrong. Fault us for accuracy, but don’t question our consistency.
Predata is not alone, of course, in having called the election incorrectly. Virtually everyone else and every other major forecasting institution of repute did as well. Even the forecasters who got the result “right” were wrong: so-called “outlier” polls such as the LA Times correctly gave the flavor of an Trump overall victory, but their prediction of Trump winning at the national level is complicated by the fact that Clinton appears on track to win the popular vote; Tom McLellan of the McLellan Market Report said Trump would win but, in similar fashion, explicitly confined his prediction to the popular vote. Most importantly of all, the Trump campaign itself did not see the result coming, as numerous Trump surrogates have spent much of this morning conceding. Alone in the prediction business, a professor of history at American University, with a quirky forecasting system of 13 true/false statements, got the result right. Tuesday wasn’t just a triumph for the forgotten people of the Rust Belt and America’s rural hinterland; it was also a triumph for the more obscure fringes of the forecasting business, and a stinging rebuke to the data-driven models that Democrats had been so superbly certain would deliver them the White House.
In the spirit of transparency, here is a full rundown of the things we got wrong about the election, the thing (note the singular) we got right, and the things on which it’s too early to tell, one way or another.
Things we got wrong
- “Clinton will win the presidency.” From the beginning of the debate season on September 26 to Election Day, our headline/whole-of-internet monitoring of the digital conversation consistently showed Clinton in front. Yes, she may eventually win the popular vote, offering some consolation that this reading of the electoral mood was accurate — but even should that come to pass, her lead in the digital conversation was far larger than any advantage she might eke out in the popular vote.
- “There is no ‘silent majority’ for Trump.” Given that working-class white voters in rural areas and the suburbs of the Rust Belt apparently turned out in record numbers to vote for Trump, it turns out there is a silent majority after all. Whether these “shy voters,” already inaccessible to the pollsters, would have showed up more convincingly in Predata’s digital signals with better signal curation/construction is unclear. Rural and working-class/uneducated households in the U.S. — regardless of ethnicity — have comparatively lower rates of internet usage.
- “Trump’s mistreatment of women is the election’s defining issue.” It wasn’t. More white women voted for Trump than for Clinton.
Thing we got right
- “Trump is edging the contest in the battleground states — and in Florida in particular.” We saw a swing back to Trump in our overall battleground state signals in the final days of the campaign. On Monday, the final battleground state score stood at Trump 89.3% vs. Clinton 67.8%, though it’s worth pointing out that data adjustments mean the size of Trump’s “lead” appeared smaller at the time. In Florida, which had always loomed as a must-win for Trump (count it now as a did-win), Trump enjoyed a similar advantage in our scores on Monday, albeit a razor-thin one: 84.7% to 83.7% for Clinton. (Here are the charts summarizing the state of play on Election Day morning.) We saw this evidence but judged it insufficiently strong to disturb our overall prediction, given the size of Clinton’s advantage at a headline level and the old chestnut about “paths to 270” and the “Electoral College math” working much more heavily in the Democrats’ favor. This was a mistake: in retrospect, we should have listened more closely to what Predata’s battleground state signals were telling us.
Things on which it’s too early to tell, one way or another
- “The markets will correct after selling off last week.” The early signs for the peso aren’t good, though other assets — U.S. equities in particular — appear to have taken the Trump bombshell in their stride. The signs remain mixed.
- “The emails and James Comey’s dramatic campaign intervention do not matter.” It’s impossible to know the actual impact the email scandal, and its late twist in particular, had on voters, but there’s little doubt Comey’s October 29 announcement gave the President-elect fresh messaging impetus (“Drain the swamp” and all that) in the final week.
Why did Predata’s reading of the “digital conversation about the election” lead us to forecast the result inaccurately? We see three main reasons: one an exogenous problem of electoral structure, one a problem of signal construction, one a problem of analysis.
- The Electoral College introduces a kink that greatly complicates effective monitoring of the digital political conversation.The U.S., of course, is not the only electoral system where scooping the popular vote can sometimes be insufficient to win an election: Australia, with a wholly different preferential voting system, saw something similar happen in 1998. But the U.S. system is uniquely quirky and complex, and capturing the interaction between individual states and counties and the national collective as they manifest in the conversation online is no easy task. Predata’s monitoring of the digital campaign during the Brexit referendum showed that online conversations about major political events can offer valuable insight into the shape, fluctuations and likely outcome of those events. But Brexit was a far simpler animal, at least from the perspective of forecasting and data analysis, than the U.S. presidential race: the British people faced a straightforward Remain/Leave binary choice in which a simple national majority was enough to carry the day. The U.S. electoral system, with its state-based electors and Colleges, in-built federalist corrections against majority rule, and creaking machinery of voter registration and balloting procedures, is a tougher beast to tame. Analysis of the digital conversation does not map onto this electoral spaghetti with ease. But that’s not to say it can’t be done, and that’s not to say we won’t work harder next time to capture the totality of the online conversation and “fit” its constituent parts onto the geographic and demographic units that determine electoral outcomes in the U.S.
- Trump was a unique candidate and engagement with his online message/campaign was uniquely difficult to “capture” in a way that would still ensure a fair digital contest with Clinton. Trump presented two challenges as we set about the task of building the digital signals for this election: he is a walking spectacle and a publicity sponge, but not all the attention that comes his way online is positive; and he has, as we noted in early September, an army of online support vastly superior to anything Clinton can count on. Trump, quite simply, is a wellspring of digital material in a way that Clinton never has been: his supporters made sure his speeches, rallies, campaign events and interviews all made it to the internet, but there was no equivalent repository of campaign material for Clinton online. If we had included all these pro-Trump sources in our coverage, the Trump signal would have flooded the overall conversation and we would have been left with useless data. Our objective was to ensure the contest between the two headline signals for each candidate was fair, and that each signal included not just an even number of sources, but the same types of sources. In the process, we over-emphasized each campaign’s “official” messaging and under-estimated “unofficial” messaging. Trump did not win because he uploaded some ads to his official YouTube channel. Rather, his victory was a function of the support that thrived in the seamier, populist corners of the internet — the corners, that is, we made an early decision to exclude from our curation of the signal sets.
- We paid too much attention to the noise of other forecasters and “experts.” Other indicators — polls, betting markets, early turnout statistics — aligned with what we saw in the headline digital signals and gave us confidence in our prediction of a Clinton win. These indicators turned out to be similarly faulty. We have stressed all along that Predata’s campaign scores should not be viewed as a prediction in their own right, but as a single data point to be used, in conjunction with other data points, to form a reasoned view on the likely outcome of the race. That approach will continue to guide our coverage of future elections, but in complex and tight races especially, we will be sure to pay closer attention to what the digital signals are actually saying (see the discussion of the swing states above) and weigh the evidence from other forecasting indicators with greater skepticism. We remain, in this sense, long Predata (quelle surprise).
In sum, the outcome wasn’t what we expected, but valuable lessons have been learned for next time. And to be clear: there will be a next time. If anything, the rise of Trump — a cellphone and social media addict so far gone he might as well be a millennial — represents a triumph for unfiltered, unsmoothed, “incorrect” political messaging, and the power of the internet to mobilize hidden electoral masses. This is a long-run trend: the years ahead, as we argued on Monday, promise political campaigns more, not less, like this one; more candidates like Trump; more attempts to pitch the electorate in the places and on the devices that populate their everyday routines — on their phones, in their DMs, on their lock screens and in their mentions; more reduction in the friction between messenger and recipient; more overlap between the campaign on the ground and the campaign on the internet; more tweets and more trauma. Everything is digital now: for political campaigns, every day on the trail is an opportunity to weaponize anew online.
The growing unreliability of polling, as we’ve known for a while, is linked to the growing inability of pollsters to draw a random sample of the population. Pollsters can’t get to people where they live and communicate (especially in the U.S., which places legal restrictions on the ability of polling companies to cold-call mobile phones), and response rates are in free-fall. These are structural data-gathering obstacles that Predata’s monitoring of digital conversations does not face. People may have stopped talking to pollsters, but they are talking online — to their political leaders and to each other — with increasing frequency. In an era in which polling is fading as a useful electoral indicator, the popular mood is set — who knows for how long? — against the establishment in all its tentacular influence, and voters, especially in advanced economies, increasingly discuss and debate their electoral preferences online, away from the pollsters, Predata’s digital-first approach will have only more, not less, to offer our collective understanding of politics and public opinion — Tuesday’s miss notwithstanding.
In the meantime, there are plenty of fresh risks looming on the global political calendar, many of them in the digital-rich environment of the world’s most advanced economies. On December 4 Italy will vote in a key referendum on constitutional reform. France faces the first round of a critical primary on November 20 ahead of next year’s presidential election. Adverse outcomes in these races, which we have already begun covering (France, Italy) in earnest, could plunge Europe into crisis — again. Tuesday was a setback for our handling and interpretation of the digital signals we build around political events, but a spur to improvement at the same time. Upcoming events on the European political calendar will offer further opportunities to refine and iterate our approach. ⏪
Aaron Timms is Predata’s Director of Content. To contact him, email email@example.com.