What does Facebook do with the data & why does it matter?

Adrienne Royer
13 min readMar 26, 2018

--

Is Cambridge Analytica a Big Story? Part IV

Click here to read Part I, here for Part II, and here for Part III.

How did the Facebook API work?

Read this excellent post by Jonathan Albright, a journalism professor, about what the Facebook API pulled, how it worked and how it changed since it was first released in 2010. If you are a journalist, please commit it to memory.

Note that he mentions Instagram currently has an API that still pulls all of this information.

Between 2010–2015, apps could pull almost every piece of data from your profile, and you had no control over it. While people opted into adding apps, the extent of information available was not made clear to users. You could also pull insights from Facebook Pages. Consider what was available from the Graph API between 2010–2015:

Screen capture of API documentation from Facebook c. 2010–2015

How much data has been pulled from Facebook?

Facebook has promised to audit “suspicious activity,” but they aren’t releasing a number on the number of individuals and organizations that used the Graph API to gather data between 2010 until it was deprecated in 2015.

We know that at the time of the 2015 announcement, Facebook had approved more than 40,000 apps that requested information beyond what was considered “basic account information” — name, email, gender, birthday, current city, and profile picture URL. According to ReadWriteWeb:

The company wants to prevent third-party apps from gathering unnecessary information about Facebook users from the get-go, so it also instituted a new Login Review process. Apps that ask for basic data — like a public profile, e-mail address and friend list — can bypass it, but those trying to dig in deeper will have to go through a manual review by Facebook staff. The team makes its decision based on how reasonable the data requests are, assessing whether they’re really necessary for the app to function.

The process can take roughly three to five days per app, Cross added, though the team aims for just a day or two. So far, he estimates that Facebook has reviewed more than 40,000 apps over the past year.

Facebook made it clear that developers could keep the data collected from the first version. At the time, TechCrunch reported:

Apps don’t have to delete data they’ve already pulled. If someone gave your data to an app, it could go on using it. However, if you request that a developer delete your data, it has to. However, how you submit those requests could be through a form, via email, or in other ways that vary app to app. You can also always go to your App Privacy Settings and remove permissions for an app to pull more data about you in the future.

BrightPlanet noted:

This change means that web data collection companies are not able to automatically collect data from individual user pages, even if the user’s profile settings are set to public.

The only data that can be collected is from non-individual pages that are public and can be liked instead of accessed through a friend request. These pages include company pages, celebrities, and other entities you may follow.

Facebook data mining is common and not a secret.

Facebook data mining is a fairly common practice used by Farmville, Tinder and other popular apps that integrate with Facebook. In an article that now seems prescient, MIT’s Technology Review wrote about the potential for Facebook data mining in 2012:

Hammerbacher agrees that Facebook could sell its data science and points to its currently free Insights service for advertisers and website owners, which shows how their content is being shared on Facebook. That could become much more useful to businesses if Facebook added data obtained when its “Like” button tracks activity all over the Web, or demographic data or information about what people read on the site. There’s precedent for offering such analytics for a fee: at the end of 2011 Google started charging $150,000 annually for a premium version of a service that analyzes a business’s Web traffic.

Nor was this information a secret. In 2016, the Washington Post wrote an article on the “98 personal data points that Facebook uses to target ads to you:”

On top of that, Facebook offers marketers the option to target ads according to data compiled by firms like Experian, Acxiom and Epsilon, which have historically fueled mailing lists and other sorts of offline efforts. These firms build their profiles over a period of years, gathering data from government and public records, consumer contests, warranties and surveys, and private commercial sources — like loyalty card purchase histories or magazine subscription lists. Whatever they gather from those searches can also be fed into a model to draw further conclusions, like whether you’re likely to be an investor or buy organic for your kids.

When combined with the information you’ve already given Facebook, through your profile and your clicks, you end up with what is arguably the most complete consumer profile on earth: a snapshot not only of your Facebook activity, but your behaviors elsewhere in the online (and offline!) worlds.

Over the weekend, The Verge published an account from a marketing executive who watched the widespread use of these tactics over the past decade:

In 2012, the data analytics firm Microstrategy pitched me on their “Wisdom” tool for Facebook, which the company had touted as a data source based on ”12 million anonymous, opted-in Facebook users.” But when I spoke with an analyst at Microstrategy that December, he told me that the company’s data set — by then, nearly 17.5 million strong — was based on just 52,600 actual installs, each of which provided access to an average of 332 friends.

Apps and quizzes were the primary source of this data:

It was pretty common knowledge among people who understood the internet that if you were taking a quiz to find out what kind of cheese you are, somebody on the other end is very interested in getting that data,” says Susanne Yada, a Facebook ad strategist. “I wish I could say I was more surprised and more alarmed. I just assumed that if you take a quiz, someone would know who you are because you are signed into Facebook.”

We’re All Lab Rats and Guinea Pigs

Even though Cambridge Analytica, the Obama campaign, Farmville, and Tinder were busy collecting data that’s still floating around, they weren’t the worst offenders. In two must-read columns at Fortune, Kalev Leetaur details how Facebook and Twitter are a treasure trove of data for academics:

Academia in particular is filled to the brim with data collected from Facebook — some collected through formal informed consent, but much of it scraped at will without any notification to the users’ whose information has been archived. To put another way, an incomplete and unknowably large cross-sectional archive of Facebook exists scattered across the file servers, databases, cloud accounts and personal laptops of university researchers all across the world. Zuckerberg makes no mention of all this data in his statement. Indeed, just days before the Cambridge Analytica announcement, researchers on one prominent academic mailing list were lamenting Facebook’s improved privacy settings that limited the amount of data they could harvest at once and discussing workarounds.

Facebook was also proactive in reaching out to academics. Here is an academic brief, co-authored by a Facebook employee, on the technical instructions for accessing data from Facebook for research.

Mark Zuckerberg has a terrible track record of not only allowing open access to user data, but also using the platform itself to run experiments.

In 2012, they partnered with Cornell University. For one week, the feeds of nearly 700,000 people were manipulated. Some saw a majority of happy or positive news. Others saw a majority of negative news. At the end of the week, these users posted words that reflected the type of news that had been displayed. No one was asked if they wanted to participate. Academic studies are allowed under the terms of service when you first sign up.

In 2017, it was leaked that Facebook had profiled Australian teenagers for a psychological study that predicted negative moods:

The internal report produced by Facebook executives, and obtained by the Australian, states that the company can monitor posts and photos in real time to determine when young people feel “stressed”, “defeated”, “overwhelmed”, “anxious”, “nervous”, “stupid”, “silly”, “useless” and a “failure”.

The report also included:

According to the Australian, the data available to advertisers includes a young user’s relationship status, location, number of friends on the platform and how often they access the site on mobile or desktop. The newspaper reported that Facebook also has information on users who are discussing “looking good and body confidence” and “working out & losing weight”.

What makes Facebook data valuable?

The power of the data mining isn’t the data pulled from Facebook. By itself, that data is worth very little. It is only when you layer that data onto voter records or a target consumer that it becomes valuable. This is called data appending.

For decades, marketers have studied how consumer data can be leveraged. They take information from store loyalty cards, magazine subscriptions, major financial purchases, such as a car or house, and catalog lists to find groups of people that fit their ideal target audience.

Political consultants took this concept and then layered polling data and voter records on top of this consumer data. That’s how we got the term “soccer moms.”

All you need is one personal identifier that remains constant. Because there are multiple Jennifer Smiths, data companies look for a phone number, birthday, address or email to match records from all the sources available.

When social media entered the picture, it became possible to add this data in addition to everything else. This provided new means of communication, ways to monitor the sentiment and reactions of targeted audiences, and easy ways to test messages before they were released on a more expensive channel such as TV or direct mail. Accuracy varies, but it is generally possible to match between 10–30% of your list.

What are Psychographics?

Psychographics are not a new concept. They can be traced back to the 1960s. AdAge explains:

In the decades since those initial studies, psychographic research has progressed to collecting opinions concerning social and other issues of the day, future perspectives, self-perceptions, personality traits, politics, business and economic climates, confidence in the economy, personal outlooks on relevant conditions, products, culture and other factors.

AdAge also provides examples of how they are used:

  • Designing product packaging
  • Retail store image & ambiance
  • Creative advertising strategy & media planning
  • Tracking shifts in consumer psyche
  • Targeting ads to fit a brand within a particular lifestyle

Buzzfeed quizzes have provided enough data for the media company to appeal to advertisers with their psychographic information:

Earlier this year, Wong told Adweek that BuzzFeed has started to operate in terms of psychographics, or what interests and beliefs people have, as opposed to traditional demographic concepts.

“People are more complicated and way cooler than just a ‘snowflake,’” he said at the time. “Traditional marketing wants to see demographic groups, but there’s a way to understand the ethos of a generation.”

Ironically, they launched this in 2017 while publishing stories about Cambridge Analytica.

Pyschographics Aren’t Silver Bullets

While Harvard Business Review may praise the use of psychographics in marketing and advertising, do they work in politics? More importantly, were they the silver bullet that caused Trump to win?

The answer to both questions is no.

We know from accounts during the campaign and many afterwards, that Cambridge Analytica’s work didn’t live up to the hype. Ben Domenech at The Federalist describes it as the “Juicero of Politics” and has a number of influential political consultants on the record about the quality of Cambridge Analytica’s work.

Patrick Ruffini also disagrees with the idea that Cambridge Analytica is the reason Trump won:

Aspects of Cambridge’s use of the Facebook data — not to mention the growing revelations about the rest of its business — are troubling. It’s unclear exactly how the data was used, but we know two things: the Trump campaign was not among its users, and the end product Cambridge was using the dataset to build, personality-based targeting, has been universally and spectacularly panned by a range of ex-Cambridge clients.

This could mean that while Facebook’s data might be able to tell us what car you’ll buy or which candidate you’ll vote for, it still can’t divine your personality or tell your secrets.

While the volume of data might be staggering, it doesn’t determine effectiveness or guarantee votes. The tests using Cambridge Analytica data showed that they weren’t successful. Tim Miller, a Republican consultant who worked on the Jeb Bush campaign confirmed this with The Federalist:

The people I respect in campaign circles didn’t respect Cambridge, didn’t view them as real. The Cruz campaign had the best data-focused campaign that we’ve seen, and they barely used Cambridge for that. The standard Republican operating procedure turned out to work much better. I’ve said every negative thing in the world about Brad Parscale and Jared Kushner, but they had a very successful and innovative approach to Facebook and they pushed Cambridge aside.

Kurt Bardella, a former spokesman for Breitbart News, echoed this sentiment:

It would be a fallacy to believe that whatever happened with Cambridge and Trump is what led to Trump’s victory,” he said. “Most reputable data firms are using proven predictive modeling techniques on an individual level, whereas Cambridge was guilty of using fancy fake science terms on unwitting politicians who do not understand how data analytics work.

The general consensus of Republican consultants, campaign staffers and vendors, who were familiar with both the Trump and Cruz campaigns, are in agreement that Cambridge Analytica’s data was oversold. The Cruz campaign found it ineffective, and the Trump campaign ended up not using it. While it might sound like the GOP is trying to distance themselves from the firm after this “scandal,” the record was fairly established long before the election.

Data doesn’t equal persuasion…or votes.

Cambridge Analytica had a lot of data, so did Hillary. Obama pioneered this type of data mining operation in 2012. However, having access to data doesn’t mean anything. It depends on how that data is used. There are hundreds of academic studies analyzing the effectiveness of political advertising that go back decades. One recent academic study made headlines by claiming persuasion ads were useless.

These two cases, Obama in 2012 and Cambridge Analytica in 2016, present a stark contrast. Obama, which gathered data from Facebook users who added their app AND information about their friends, mostly used it for data matching, location and voter file information. Cambridge Analytica tried to use it to apply psychographic research. Obama was successful. Cambridge Analytica ended up with two high-profile clients that were displeased.

A glowing review of Obama’s technology in a Technology Review article from 2012 reveals:

Obama’s campaign began the election year confident it knew the name of every one of the 69,456,897 Americans whose votes had put him in the White House. They may have cast those votes by secret ballot, but Obama’s analysts could look at the Democrats’ vote totals in each precinct and identify the people most likely to have backed him. Pundits talked in the abstract about reassembling Obama’s 2008 coalition. But within the campaign, the goal was literal. They would reassemble the coalition, one by one, through personal contacts.

When reading articles about the volume of data collected by Cambridge Analytica, remember that data is data. By itself, access to data doesn’t make political or marketing campaigns successful. It must be used in successful strategies, such as the Obama 2012 campaign, to be of any use.

While the availability of data means very little without application, the larger question is the ethics of data mining.

Is Cambridge Analytica the “aha” moment for privacy concerns?

We’ve all had that moment when ads matched what was going on in our lives a little too closely:

In December, the Washington Post asked, “Can Data Mining Make for Cute Ads?All but one person scoffed at the idea that Big Data Brother was on the creepy side:

“This gives the public a kind of view into the ways that the major content companies are gathering and using our data,” said Jeffrey Chester, head of the nonprofit Center for Digital Democracy, which advocates for consumer protection and privacy. “Behind the ease of being able to access video and audio content are very sophisticated customer surveillance and analytics applications, and there’s nothing funny about that.”

Now, we have a scandal that reveals how private-sector companies are gathering seemingly harmless data about our everyday, mundane lives. In the past week, many on Facebook have laughed that their love of cat videos is being collected. However, as noted above, when data is aggregated, it delivers an incredibly intimate picture of our lives to anyone willing to purchase that information.

Is it surprising that sentiment towards Silicon Valley is declining? An Axios poll in March found that 55% of respondents don’t believe the government will do enough to regulate Silicon Valley. A Reuters/Ipsos poll released on March 25 found that only 41% of Americans trust Facebook to follow U.S. privacy laws compared to 66% who trust Amazon, 62% who trust Google, and 60% who trust Microsoft.

Will this situation prompt a debate over the need for privacy laws? Do we own the data collected about ourselves or the company doing the collecting? Will the US move towards a European model, or is that even possible with the First Amendment? Is it possible to create a “right to be forgotten” and completely opt out of data collection?

With Facebook bungling their response to the Cambridge Analytica “scandal,” active investigations in both the U.S. and U.K., and Congress calling on Mark Zuckerberg to testify, this story will be around for a while. This might be the final shove the public needed to demand a debate over privacy, data ownership and security. If it is, what is the right answer?

--

--