How Facebook Saw Trump Coming When No One Else Did

Electoral map via The New York Times as of 11/09/2016 at 4:04am ET.

As the electoral map turned crimson this evening, everyone exclaimed that the data and polls had not seen this coming. They were only partly right. At least one overlooked data source had made a very strong suggestion that Donald Trump enjoyed an unquantified current of popular support.

As of a few days before the election, Trump had collected 12 million total Facebook likes — 4.1 million more than Hillary’s 7.9 million— while national polls predicted a greater than 65% chance of a Hillary win. This pattern recalled Brexit, the recent referendum surprise in the UK that polls poorly predicted but that Facebook correctly foreshadowed.

How could pre-election polls and experts differ so drastically with the world’s largest social network, which provides effectively a minute by minute “poll” of the nation?

In the last days before the election, I began to research this logical incongruity and finally found suitable candidate interest data from Facebook with which to do it. Data in hand, I was able to relate traditional polling data to Facebook interest data right as the election was nearing a close. Here is what I found.

In every state, Trump interested more people than Hillary by factors of 2x in very blue states to 12x in very red states. There are likely many causes of this high level of interest, but the impact is that Trump was more effective at social media canvassing and earned media. The large gap in his fundraising and organizing efforts was closed by this and other media efforts.

Data from Facebook conversations captured a truer picture of nationwide sentiment and support than polling data could capture. The lack of visibility into this overwhelming base led to narrowing Hillary’s apparent margin of victory in blue states, upsetting it in swing states (Facebook data predicted both Pennsylvania and Michigan), and shoring up Trump’s margin in red states.

Now, a caveat. I’m neither a statistician nor a polling expert. I am a student, practitioner, and fan of media and technology who has watched every few years as both have shaped and reshaped how national campaigns are won and lost. I’m hopeful this critical insight into social media and Facebook as one of our most important contemporary data sources and bellwethers can motivate experts to update polling methodologies and prediction practices. This is the data that directly inspires media sentiment, organizing action, and ballot casting. This year it may have cost an election. Predictions must be more accurate.

To the numbers —

What candidate was Facebook interested in?

Facebook’s advertising tools provide a way to estimate the number of people on its service who fit given interest and demographic criteria. According to Facebook —

Interest may include things people share on their Timelines, apps they use, ads they click, Pages they like and other activities on and off of Facebook and Instagram. Interests may also factor in demographics such as age, gender and location.

Therefore, Facebook considers interest as general engagement with a topic. (Facebook does not state whether interest is positive or negative in sentiment.)

After pulling interest data for each candidate by state, it was impossible not to be astounded by just how much further Donald Trump’s online reach has been than Hillary Clinton’s…in every single state. Take a look.

Facebook’s ad tool shows the number of active users who are interested in a candidate.

The chart of this data depicted Trump’s enormous interest levels more vividly by state:

Donald Trump’s Facebook reach was greater than Hillary Clinton’s…in every single state by astounding factors.

In New York, a tried-and-true blue state with no chance of swinging this election, Trump interested an estimated 960,000 people while Hillary interested an estimated 290,000 people. That’s a factor of 3.3x more people who engaged with Trump information on Facebook. For a blue state!

In traditionally red states like Mississippi, Trump interested an estimated 220,000 people while Hillary interested an estimated 19,000 people. That’s a factor of 11.6x more people interested in Trump than Hillary in a traditionally Republican area. Even for a very red state, that is an astounding differential.

On Facebook just a few days before the election, a total of 3.6 million active U.S. users were interested in Hillary while a massive 16.1 million active U.S. users were interested in Donald Trump. Naturally there are biases and errors and behavior modes to correct for in this data, but the general trend of increased energy spent thinking about one candidate is absolutely unmistakable.

What could Facebook data tell us that polling data could not?

While Trump garnered greater interest across the board, what did that mean numerically and for his appeal in specific states? To test for a relationship, I ran a linear regression between poll-predicted candidate win margins and Trump-interest levels for each state. I used polling data from FiveThirtyEight the weekend before the election as the basis. I used additional Facebook data to create “Trump interest level percentages” by dividing Trump interested people by de-duplicated Trump or Hillary interested people for each state.

The overall correlation between states’ pre-election polling and Facebook interest levels was decent as indicated by the R squared value of 0.7 (1 is a perfect fit, 0 is a terrible fit). That means Facebook data did a well enough job of explaining the pre-election poll data and vice versa.

The tabular data of the same showed where contradictions (and controversies) brewed more specifically by state. Using the regression formula, I calculated a predicted margin according to the Facebook data (negative is Republican, positive is Democrat). It’s very far from absolutely reliable, but it shows some directionally interesting issues. The mismatched red and blue rows highlight state-level discord that appeared between polling and Facebook data:

State by state: Facebook people interested in Trump, pre-election poll margin predictions, and Facebook-influenced margin predictions.

Facebook data largely confirmed stronghold states

Traditionally red states like Louisiana, Mississippi, and Alabama who pre-polled that way were confirmed by the Facebook data. Similarly, blue states like California, New York, and D.C. were upheld as well. In each of these states, actual support for the candidates fell in line proportionally with the engagement with these candidates on Facebook.

Facebook data showed stronger Trump support in key states Pennsylvania, Michigan, North Carolina

While pre-election polls predicted a win for Hillary in PA, MI, and NC, pre-election Facebook data suggested the opposite result. Interest in Trump was so outsized in those three states that the likelihood of a Democratic win faded in favor of an upset. And in fact, Trump won all three of these states in the election a few days later (as of the writing of this post).

Facebook data suggested further discord with traditional voting patterns

Very blue state Hawaii showed one of the highest Trump interest levels in the entire nation. While this could have been just small sample size, it was a curious signal to explore the national Trump factor at play in both blue and red states. While Iowa and Ohio showed relatively less Trump-interest, their ‘flip-flop’ was evidence of further discord. Florida’s sentiment fell just shy of red which suggested it wasn’t a sure bet either. And finally, overwhelming Trump interest overall (67% to 94% of Facebook users interested in any candidate) versus much less Hillary interest made clear that the election might likely hold a surprise.

Publicly available social media data signaled the upset possibility

Despite this data being teased in very public Facebook Like data on candidate pages and less obviously but still publicly in Facebook’s ad tool, analysts and media have written very few pages about it. While the possible statistical objections are likely high (again, not my area of expertise), the common sense objections to at least giving it mention seem few. This curious choice has led to an incredibly useful research sample of hundreds of millions of American people being mostly disregarded.

What does this mean going forward?

A candidate who dominated online (and offline) conversation earned more media and organized more supporters as a result. This particular aspect (there are many more to this upset) was social media making content creation, human connection, and word of mouth endorsement cheaper, faster, easier, and more effective than traditional campaigning. It was the somewhat unseen surge of a strong popular movement.

Technology is transforming voter decision-making and public expression of candidate support just as technology has changed so many behaviors and expectations in the last few decades. Our window into how our nation thinks about, acts for, and believes in its political candidates has never been more clear. Let’s pay attention to it.

Future of media through technology. Live from NYC.

Future of media through technology. Live from NYC.