Has Trump’s victory truly exposed the limitations of Big Data?

Sachin Akhuri
The Coffeelicious
Published in
6 min readNov 18, 2016

It would seem that Hillary Clinton’s supporters aren’t the only ones reacting in shock to Donald Trump’s unexpected victory. Several Data Scientists and analysts across the globe are currently tearing out their hair in frustration as they struggle to understand where exactly their predictions went wrong.

Until a few days ago, almost every article on the US Presidential Election was confidently predicting a Clinton victory. Data projections, such as the one in this Salon article, predicted a high voter turnout in the black and latino populations to propel the former First Lady to what should have been an easy victory. Indeed, many went so far as to claim that the African-American turnout in Florida, Michigan and Ohio was going to eclipse that in the 2012 election.

The actual result turned out to be rather. . . unexpected, to say the least.

Which begs the question: where did everyone go wrong?

Did people seriously underestimate The Donald to such an extent? Did the experts greatly underestimate the effect of his aggressive speeches and bold statements? Was it simply the effect of a hypercharged overemotional atmosphere that threw everyone’s expectations for a loop?

Was it Trump’s emotional intelligence that enabled him to knock the ball out of the park when the odds were so clearly not in his favor? Is this perhaps a much-needed wake-up call to data analysts who tirelessly champion the supremacy of Big Data, erroneously convinced that predictive analytics has reached a level of accuracy where Big Data can easily run the world?

Or was it simply that everyone was looking in the wrong direction all this time?

To be fair, predicting the results of something as dynamic as an election is not easy. Human behavior is notoriously difficult to anticipate, and when you combine this with various factors such as last-minute campaigning thrown into the mix, it all gets complicated really fast. Getting a proper representative sample is by itself a huge challenge, partly because of some assumptions that simply have to be made due to unavailability of data, and partly because all pollsters have a tendency to unconsciously inject their own biases into the process, something rather unavoidable in such an emotionally-charged atmosphere. The Electoral college system only adds to the complexity of the process.

There’s also the fact that Donald Trump has consistently shown a rather remarkable ability of constantly proving analysts wrong. When Trump announced his campaign for the presidency in June 2015, pundits dismissed him outright and pollsters said the numbers proved he’d never win the nomination. And we all remember how that turned out.

Harry Enten, senior analyst at FiveThirtyEight had spoken on this very topic on the This American Life podcast. He, like most analysts, had used favorability as a metric to determine to the chances of success of each presidential candidate. Trump’s style of campaigning during the primaries drew severe flake from the public and made him unlikable to many voters, and as a consequence, the odds of him getting the GOP nomination seemed pretty non-existent, to say nothing of the presidency.

But then something unexpected happened. After his (in)famous speech announcing his promise to build a “Great Wall” around Mexico, Trump’s likability numbers skyrocketed. The sudden jump, from a reported -37% to a +17%, took everyone by surprise. Had anyone taken a closer look at the data then, they would have found that the issue of immigration had suddenly proved itself as a major selling point for Trump. And the longer that issue remained at the forefront of the public’s mind (mostly thanks to social media), the more it helped Trump improve his standing when compared to the other candidates.

But still pundits refused to acknowledge this. They believed that this was a fluke, and that better sense would eventually prevail among the voters, and eventually things would be back to the status quo. But this was not to be, for hidden among the numbers was a tiny detail that most folks had missed. It was that while Trump’s favorability ratings were lower than other candidates, his “Strongly Favorable” ratings were much higher, about 30%. The voters who made up this percentage were the fanatics, the die-hards: the people who were committed to their choice in voting for Trump and would not be persuaded otherwise. These were the people who were guaranteed to make up Trump’s share of the ballot. And while 30% may not seem like a huge number, it was still a formidable fraction when the number of candidates was so large.

What really made this 30% all the more formidable was the fact that many of the voters were still undecided all the way until election day. It was something most analysts were not unaware of, since the majority of the opinion polls had Clinton winning by a small margin of around 180–200, 000 votes or so. However, analysts did grossly underestimate the number of these undecided voters. As the former First Lady found out to her dismay on the 9th of November, the number of people who remained undecided till the very end, and thus chose to not vote, had cost her dearly.

As the above graph shows, Democratic voter turnout for the elections was the lowest it had been in years. This huge miscalculation was largely influenced by the ongoing narrative on social media days before the election, which was largely anti-Trump. Everyone was so distracted by the vocal anger being directed towards the Republican nominee that they forgot that there were people who felt the same way about Clinton, and worse: that there was a deceptively large group which didn’t know what to think about her, but wasn’t exactly talking about it. This silent group remained beneath everyone’s notice until Election Day, when their absence came to define them more than their presence would have.

Another interesting point about the polling, highlighted by the USC/L.A. Times poll (one of the very few that had predicted success for Trump) was that gathering of polling data was greatly affected by the controversial nature of the Republican candidate. Many Trump supporters seemed to be much more reluctant to share their choice of candidate with online pollsters as opposed to their family and friends. This was perhaps heavily influenced by the largely negative rhetoric surrounding Donald Trump and his open supporters towards the end of the election. It’s a hypothesis that does indeed make quite a bit of sense, since people will naturally be wary of being labelled as “sexists” and “bigots”, and more than a bit of weary of supporting a candidate who has frequently been labelled this by the media.

The signs were already showing by Nov 1st, but nobody was actually looking.

To summarize. . .

If the 2016 US Elections proved anything, it is that Big Data is by no means the foolproof, end-all-be-all method analysts imagined it to be. There are lessons to be learned: about the most fundamental of concepts such as statistical weighing (a large dataset like the population of a country must be put through the right modelling), being careful about data gathering, judiciously restricting the number of influencing factors (and the deceptive influence of social media), and techniques used to prove hypothesis (data points to be looked at). As an analyst, I’d say this has very interesting and educational election year for the Unites States of America.

--

--