How Facebook Is Using Big Data — The Good, The Bad, And The Ugly

Last updated on December 03, 2015


A report from McKinsey & Co. stated that by 2009, companies having more than 1,000 employees already had more than 200 terabytes of data of their customer’s lives stored. Now, add this startling amount of stored data to the rapid growth of data that is seen in social media over the last four years. There are trillions of tweets, billions of Facebook likes, and an even higher number of check-ins on Foursquare. Instagram and Pinterest are only adding to this social media data deluge. Picture the buckets of data that has been gathered by social media sites alone. Social media guarantees the acceleration of innovation, the drive of cost savings, and the strengthening of brands through mass collaboration. Across every industry, companies are using these platforms to market and hype up their services and products, along with monitoring what the audience is saying about their brand. The convergence of social media and big data gives birth to a whole new level of technology.

The Facebook context

A popular network for the last five years with over 1.2 billion users worldwide, Facebook stores a gigantic amount of user data, making it a massive data wonderland. The Social Media Marketing Industry Report, 2015 states that Facebook is the #1 social platform for marketers.

Every day, we feed Facebook’s data beast with mounds of information. 10 billion Facebook messages, 4.5 billion hits on the ‘like’ button, 350 million new picture uploads, on a daily basis. At first, this information may not mean anything to a lot of people. But with data like this, Facebook knows who our friends are, what we look like, where we are, what we are doing, our likes, our dislikes, and so much more. Some researchers even say Facebook has enough data to know us better than our therapists!

Apart from Google, Facebook is probably the only company that possess a high level of detailed customer information. The more users use Facebook, the more information they suck out. Heavily investing in their ability to collect, store, and analyze data, Facebook does not stop there. Apart from analyzing user data, Facebook has other ways of determining user behavior. 1. Tracking cookies: Facebook tracks its users across the Web with the use of tracking cookies. If a user is logged into Facebook and simultaneously browses the Web, Facebook can track the sites they are visiting. 2. Facial recognition: The latest investment of Facebook is in facial recognition and image processing capabilities. Facebook can track its users throughout the Web and other Facebook profiles due to the image data that is stored with them due to user sharing. 3. Tag suggestion: Tag suggestions can be made on user photos due to image processing and facial recognition. 4. Analyzing the ‘Likes’: A recent study conducted showed that is viable to predict data accurately on a range of personal attributes that are highly sensitive just by analyzing the ‘Likes’ that have been clicked by a user on Facebook. “The work conducted by researchers at Cambridge University and Microsoft Research shows how the patterns of Facebook ‘Likes’ can very accurately predict your sexual orientation, satisfaction with life, intelligence, emotional stability, religion, alcohol use and drug use, relationship status, age, gender, race and political views among many others.” Facebook Inc. analytics chief Ken Rudin says, “Big Data is crucial to the company’s very being”. He goes on to say that, “Facebook relies on a massive installation of Hadoop, a highly scalable open-source framework that uses clusters of low-cost servers to solve problems. Facebook even designs its own hardware for this purpose. Hadoop is just one of many Big Data technologies employed at Facebook.”


Here are a few examples that show how Facebook uses its Big Data.

Example 1: The Flashback

Honoring its 10th anniversary, Facebook offered its users the option of viewing and sharing a movie which traces the course of their social network activity from the date of registration till the present. Called the ‘Flashback’, this video is a collection of photos and posts that received the most comments and likes and set to a nostalgic background music.

Example 2: “I Voted”

Facebook successfully tied political activity to user engagement when they came out with a social experiment by creating a sticker allowing its users to declare “I Voted” on their profiles. This experiment ran during the 2010 midterm elections and seemed effective. Users who noticed the button were likely to vote and be vocal about the behavior of voting once they saw their friends were participating in it. Out of a total of 61 million users, then, 20% of the users who saw their friends voting, also clicked on the sticker. The Data science unit at Facebook has claimed that with the combination of their stickers that motivated close to 60,000 voters directly, and the social contagion, which motivated 280,000 connected users to vote for a total of 340,000 additional voters in the midterm elections.

Example 3: ‘Celebrate Pride’

Following the Supreme Court’s judgment on same sex marriage as a Constitutional right, Facebook turned in a ‘rainbow drenched spectacle’, called ‘Celebrate Pride’. Celebrations such as these hadn’t been seen since 2013, when 3 million people began to change their profile pictures to the red equals’ sign which was the logo of the Human Rights Campaign. This was a way of showing support to marriage equality. Facebook provided an easy and simple way to turn the profile pictures into rainbow colored ones. Within the first few hours more than a million users had changed their profile pictures according to the spokesperson for Facebook, William Nevius. All this excitement also raised questions to what kind of research Facebook was conducting after their tracking user moods and coting behavior research. When the company published a paper ‘The Diffusion of Support in an Online Social Movement’ two data scientists at Facebook, had analyzed the factors which predicted the support for marriage equality on Facebook. Factors that contributed to a user changing profile pictures to the red sign were looked at.

The Downsides

Privacy Issues:

Due to this massive gold mine of data, advertisers wait on like hungry vultures. This leads to high levels of privacy concerns among users. Facebook, however, has always assured its users that information is shared only with their permission and anonymized when sold on to marketers. Issues, although, still seem to crop up. For example, a lot of users complain that the privacy settings are not clearly explained or too complex. It is easy for users to share things unintentionally. On trying to fix these issues Facebook has in turn confused its users even more. Another privacy issue that cropped up was that of the Facial recognition which prompted an investigation by the EU privacy regulators in 2011. The Facebook’s Graph search also lead to another outburst since it gives stranger greater than ever access to our private data. So the question that users ask is, Is Privacy Dead?

The 2 Problems with Facebook:

Ken Rudin states that companies who rely on Big Data often owe their frustration to two mistakes:

#1. They rely too much on one technology, like Hadoop. Facebook relies on a massive installation of the Hadoop software, which is a highly scalable open source framework that uses bundles of low cost servers to solve problems. The company even designs its own in-house hardware for this purpose. Mr. Rudin goes on to say that Hadoop is just one of Big Data’s many technologies. Hadoop isn’t enough! He then says, “The analytic process at Facebook begins with a 300 petabyte data analysis warehouse. To answer a specific query, data is often pulled out of the warehouse and placed into a table so that it can be studied, he said. The team also built a search engine that indexes data in the warehouse. These are just some of many technologies that Facebook uses to manage and analyze information.” #2. Companies use big data to answer meaningless questions. Mr. Rudin says, “At Facebook, a meaningful question is defined as one that leads to an answer that provides a basis for changing behavior. If you can’t imagine how the answer to a question would lead you to change your business practices, the question isn’t worth asking.”

Recent advancements in Facebook

The Introduction of Topic Data

Facebook has recently introduced Topic Data to a few of its partners. What is it? Topic Data is one such technology that will display to marketers the responses of the audience with regard to brands, events, activities, and subjects, in a way to will keep their personal information private. Marketers can in turn use this information from topic data to selectively change the way they market on the platform as well as other channels.

This data was previously offered by third parties, but was not as useful because the sample size was too tiny to be significant and the determination of demographics was almost impossible. With the introduction of Topic Data, Facebook has grouped the data and stripped personal information for user activity to help marketers by offering insights on all the possible activities around a certain topic. This will thus result in marketers getting an actionable and a comprehensive view of their audience for the first time. Arriving at the issue of privacy, Facebook has assured that all personal information will stay private. All the information that will be used for Topic Data will be aggregated and annonymized. Fascinated by Big Data? Interested in launching a career in Big Data? Want to learn more about how social media runs on Big Data? Simplilearn offers a Big Data and Hadoop training course. With 32 hours of instructor led training, 25 hours of high quality eLearning material, hands on projects with CloudLabs, and Java Essentials for Hadoop, take your first steps into the world of Big Data. Here is a sneak peek into our world class course:

This article showcases the ways in which Facebook uses big data.

Originally published at