Misinformation on Facebook

R. Michael Alvarez
Trustworthy Social Media
4 min readDec 29, 2020

As researchers, we do not know enough about misinformation on Facebook.

Why? Because Facebook does not give researchers easy access to content from their platform.

Everyone hears stories about the rampant misinformation and negativity on Facebook. Recently, Charlie Warzel wrote an interesting article about this trend in the New York Times titled “What Facebook Fed the Baby Boomers.” In the article, he discusses how he worked with two baby boomers’ to access their Facebook feeds, and describes just how much toxic and misleading content the subjects were exposed to. Warzel summarizes the experience of one of the two baby boomers as “an information hellscape.”

It’s an interesting read, but a question remains: why did Warzel take the approach of following just two users? He writes in the article:

“Despite Facebook’s reputation as a leading source for conspiracy theories and misinformation, what goes on in most average Americans’ news feeds is nearly impossible for outsiders to observe. Tools like CrowdTangle, which track “engagements” with social media posts, are the best available means to understand what is popular on the platform, though Facebook (which owns the CrowdTangle) argues that CrowdTangle is not a reliable indicator for how many people saw a post.”

Unfortunately, the approach used by Warzel is one of the few ways that a researcher can get access to the sort of information that flows through most Facebook accounts. But while Warzel’s approach is interesting (and makes for a great read), it does not provide timely and systematic data that researchers need to understand how much misinformation, harassment, trolling, and hate speech exists on Facebook — nor does this approach give us the data to build tools that can detect and perhaps prevent misinformation on Facebook.

Of course, you might be thinking, can’t a researcher build a scraper that could quickly collect a great deal of Facebook information? The answer is yes, but that could violate the current Facebook Terms of Service. The TOS states: “You many not access or collect data from our Products using automated means (without our prior permission) or attempt to access data you do not have permission to access.” So an effort by an independent researcher to scrape data at scale seems out of the question.

Of course, many of us know the story that led to these restrictions on data sharing with researchers. In the wake of the Cambridge Analytica scandal (the details of which have been recounted numerous times, for a good overview I’d suggest reading Christopher Wylie’s book Mindf*ck), Facebook essentially shut down research access to data from their platform. In the two and a half years since then, the research community has had very limited access to large, timely, and researcher-oriented datasets from Facebook. Facebook essentially cut off all direct research access to content from their platform.

Today, Facebook makes available a limited selection of curated datasets for academic research on their Facebook Data for Independent Research site, but that’s a far cry from allowing researchers to collect their own datasets. There’s also the Social Science One initiative, but it involves an opaque process to request access to curated datasets (right now it says that one can apply for access by sending a request on the proposals section of a website that doesn’t seem to exist at this time. I’m happy to be corrected about that if I’m wrong).

But this is not enough. Facebook has billions of users worldwide, and stories like Warzel’s imply that Facebook may have become one of the primary mechanisms for misinformation and vitriol to spread online. Facebook needs to start restoring direct research access to their platform, following the lead of their competitor, Twitter. Not only has Twitter given researchers direct access to content from their platform, but they have also recently announced a series of steps they will take in 2021 to provide qualified researchers even greater access to content from their platform. And by this I mean opening an API that allows research queries for current Facebook content, at the sort of scale that would allow researchers to study the flow of information (and misinformation) on Facebook.

What are we missing by not having direct and unimpeded access to current content on Facebook? As one example, if our group wanted to analyze the spread of misinformation in the 2020 presidential election on Facebook, in the same way that tracked on monitored these issues using Twitter data, we simply could not replicate this methodology using the Facebook platform. The word cloud below comes from our Twitter election monitor, and it shows the type of data that we collect from Twitter (which isn’t readily available from Facebook). Nor can we dig deeply into datasets on the 2020 presidential election to detect what may be small but influential networks that spread misinformation on Facebook. Thus we really cannot understand the full ecosystem of misinformation online, because we can’t easily and independently get a complete dataset on what is happening on Facebook.

From the Twitter election monitor at monitoringtheelection.us

There’s even the chance that since those who deliberately spread falsehood online know that the research community has little ability to track their misinformation campaigns on Facebook, they might be more inclined to use Facebook for their activities.

In order for research groups like ours to help companies like Facebook build a truly Trustworthy Social Media, we need timely and detailed access to data from their platform. We hope that in 2021, Facebook provides better opportunities for researchers to collect the data they need for their studies, to gather data at the scale necessary for in-depth analysis of content and spread of misinformation, and to be able to gather the data without Facebook being able to restrict the types of research that use content from their platform.

--

--