Facebook math: How 270,000 became 87 million

Anne L. Washington, PhD
Data & Society: Points
4 min readApr 11, 2018
Image via Adam Sporka

How did the choices made by only 270,000 Facebook users affect millions of people? How is it possible that the estimate of those affected changed from 50 million to 87 million so quickly? As a professor of data policy, I am interested in how information flows within organizations. In the case of Facebook and Cambridge Analytica, I was curious why this number was so inexact.

Scale and impact

The mathematics behind the story of Facebook and Cambridge Analytica provides insight into both the value and the vulnerability of data generated through social network platforms. Let’s first analyze the reported numbers using some back-of-the-envelope calculations:

On March 17 2018, journalists at the Observer and the New York Times began to publish reports that Cambridge Analytica harvested 50 million Facebook profiles based on the responses of only 270,000 people. How many profiles did they gather for each research participant? We get to 50 million quickly if each research participant had, on average, 185 friends:

50,000,000 / 270,000 = 185.19 friends, according to March 2018 numbers

On April 4 2018, Facebook CTO reported that profiles for 87 million people were harvested. Each research participant had to have, on average, 322 friends:

87,000,000 / 270,000 = 322.22 friends according to April 2018 numbers

How realistic are these numbers?

A November 2011 research paper published by Facebook employees and academics reported that the average Facebook user had 190 friends. A February 2014 Pew Research Center reported that the “average (mean) number of friends is 338, and the median (midpoint) number of friends is 200.” Both the 50 million and 87 million fall within those ball park values. Both the 2011 and 2014 research could have been known to anyone promising to provide Facebook profiles in late 2014:

270000 * 190 = 51,300,000 total impact, according to 2011 estimate of average number of friends

270000 * 338 = 91,260,000 total impact, according to 2014 estimate of average number of friends

Image via David Sousa-Rodrigues

There is another way that researchers estimate social networks based on “Friends of a Friend” (FOAF) calculations. In the classic sociology research in the 1960s, Stanley Milgram used letters sent through the post to measure friends of friends, asking: How many steps will it take to jump between any two people in the network? Milgram established that in America, there were six degrees of separation.

Researchers in 2011 said that there were only four degrees of separation on Facebook. To understand this scale, consider the following:

  • The first degree is a 10 person group.
  • The second degree is all the friends of the first group. If everyone has one friend, the size is 10 x 10 = 100
  • The third degree is all friends of friends, or 100 x 100 = 10,000
  • The fourth degree would be 10,000 x 10,000 = 100 million

In this example, just ten people can impact a far-reaching network of 100 million in four degrees. In a 1990s email replication of Milgram’s letter study, sociologist Duncan Watts focused on the negative implications of a few people quickly impacting the general population. In many ways, the new harvests of Facebook data represent a specific case of a widespread influence on a densely-connected network. Marwick and boyd have notably pointed out the interconnectedness of individuals’ control over their own information with the concept of “networked privacy.” These examples argue for social networking settings that enable us to be our brother’s keeper.

Dollars and data

Who were the research participants who made their networks vulnerable to this data harvest? The participants were paid $1-$2 to download the app, take a psychometric test, and provide access to their Facebook data, according to Zeynep Tufekci in a March 19 New York Times Op-Ed. Aleksandr Kogan, through his company Global Science Research (GSR), used Amazon Mechanical Turk, an online labor market that academics frequently use to find experimental research participants.

Participants were paid only a few dollars in exchange for information that, once processed, was worth millions to GSR.

It is important to note that the Facebook Application Program Interface (API) stopped giving third party applications the ability to harvest someone’s network. Jonathan Albright does a fabulous job in explaining the timeline of the Facebook Graph API v1.0 in his post. Many Internet companies stopped making data available to others, and instead continued to package insights from the data or release only a portion of actual data. On the other hand, as James Temperton points out in his Wired UK piece, five years of the open API moved Facebook’s valuation from $23 billion to $245 billion. Participants were paid only a few dollars in exchange for information that, once processed, was worth millions to GSR.

Unless you were looking for micro-task work in 2014, it is unlikely you would have downloaded thisisyourdigitallife, the application that is at the center of this news story. But it is possible that it was downloaded by someone who you don’t know, who was connected to someone who you do know.

Facebook, in February 2016, estimated that the distance between its 1.59 billion users was 3.46 degrees. It seems ironic that we can fidget with our individual privacy settings for hours, but the whimsy of a friend of a friend of a friend (.46 of a friend, to be exact) might have an impact on more than just what we see.

Anne L. Washington is an Assistant Professor of Data Policy at NYU, a Visiting Scholar at Data & Society, and a recent Data & Society Fellow (2016–2017).

--

--

Anne L. Washington, PhD
Data & Society: Points

computer scientist serving humanity as NYU data policy professor.