What it’s like to actually use Facebook’s ad transparency tools

A developer’s tale of wrestling with the Facebook Ad Library

Paul Duke
Online Political Transparency Project
11 min readOct 30, 2020

--

Facebook has made A Big Deal of its transparency efforts since the 2016 election, and claims that as a company it is “committed to creating a new standard of transparency and authenticity for advertising.”

However, intentions are nothing until someone does the work to try to make them reality. Facebook has provided a bunch of pieces that it claims solve the transparency puzzle, but those pieces still have to be assembled, and maintained–because nothing in tech just works indefinitely–and there’s no guarantee all the pieces are available.

I’m the main engineer for our project’s infrastructure and data collection from Facebook’s transparency offerings. I have a Bachelor’s degree in Computer Science, and worked at Google for more than eight years. At Google I had a variety of roles. I was software engineer on two different search products, and had readability in python and c++ (a type of certification at Google allowing me to approve code for style and correctness in those languages).

I’ve worked with Facebook’s transparency offerings for more than nine months now, and it feels like Facebook never even tried to verify that its transparency offerings actually achieve the promises the company has made. In fact, I’m not convinced that Facebook executives want the efforts they direct to actually enable transparency and accountability.

Facebook claims to offer industry-leading transparency, but in practice I’ve found that there are many hurdles preventing correct and complete analysis.

I’m the main engineer for our project’s infrastructure and data collection from Facebook’s transparency offerings… I’m not convinced that Facebook executives want the efforts they direct to actually enable transparency and accountability.

For me, the issues with Facebook’s offerings breakdown into two buckets:

  1. Technical issues. There are multiple practical problems that make getting the data impossible, difficult, or resource-draining (usually one or more of energy, time, and hope).
  2. Data problems. The data is structured to answer only questions that it appears Facebook wants asked. Does Facebook actually want to provide data that it thinks will enable accountability and transparency? Laura Edelson touched on this in their blog post Facebook’s political ad spending numbers don’t add up, and that is just one example of where transparency data offerings make it nearly impossible to answer the most simple questions.

Using the API

Me being a software engineer, and expecting an audience of software engineers, I will start with the technical issues. Let’s assume the mission is “make a database of all political ads on Facebook shown to users in the US” and let’s assume we want the following for each ad:

  • number of impressions
  • the amount spent
  • who paid for the ad(s)
  • image(s) and/or video(s)
  • text(s)
  • link(s)
  • Timeframe ad was active

Together this information would allow interesting analysis. For example, being able to generate numbers on spending on topics over time or groups of similar ads related to COVID-19. Neither of these are possible from Facebook’s Ad Library at the time of writing.

Facebook’s ad library API documentation makes it sound easy to build a database that would make such analysis possible. But first, you have to write the logic to transform API (application programming interface) results into a reasonable schema, then implement a collector to query the API, transform results, and store them. Ad data is not static, so you’ll need to handle updates to spending, impressions, page names, ad delivery times, etc.

screenshot of logs showing repeated errors from Facebook’s API
Logs from Ad Observatory’s API collector encountering the infamous API Error “Unknown Error” code 1 repeatedly.

Also, you need to handle the frequent, but vague API errors, such as “Unknown Error” code 1, which might indicate downtime according to Facebook. This particular error happens at least once almost every time our API collectors run. Make sure you don’t exceed the API rate limit of 200 requests per hour, although we rarely ever see the API achieve that response rate. These are not implicitly hard problems, but they do require time and energy to implement and maintain. This is time consuming. I’m on this project because the researchers could no longer handle the time required for this and also get research done.

Oh yeah, the API has occasional outages that are not communicated to users, and there is no API status dashboard. So you better know other people using the API that you can frantically email while debugging your code to figure out what broke overnight. Like October 22, 2020 where I woke up to emails from other researchers asking if I was also experiencing an outage. First we were all scrambling to figure out if there was an issue. Once we confirmed there was an issue we had to figure out if Facebook even knew there was an outage.

Getting the actual ads

Even though image(s) and video(s) are often the most important part of a political ad, and an essential component for any analysis of ad content, Facebook’s API does not provide links for videos or images. This is true even though the API documentation says “The link for video and images can serve as a unique identifier for individual ads.” Nor does it make the assets available for download as zip archives. Instead, you have to build extensive and complicated infrastructure in order to retrieve this information.

Are there any unique identifiers provided for individual ads? The link for video and images can serve as a unique identifiers
Facebook Ad Library API FAQ section discussing video and image links.

Some ads have multiple ad creative text(s), image(s), and/or video(s), and the ad library API only provides one version of an ad’s texts — no links to image(s) or video(s). The documentation doesn’t mention which of the ad’s texts is returned. Also, for some ads the API tells us the ad text is “{{product.brand}}” even though that is not the text shown to users. We reported this issue to Facebook in March 2020, but it has not been fixed. You can see some of the affected ads in the Ad Library with query “{{product.brand}}”. This bug affects 178,614 ads in our dataset of US ads from 2020 at time of writing.

If you relied on the Ad Library API alone you would have no ad image(s) or video(s), and incorrect ad text(s) for a significant portion of ads. In order to get this data, you have to fetch image(s), video(s), and text(s) from the ad archive, and that requires a full browser to execute the archive tool’s javascript heavy pages. You can’t get the images or videos without executing javascript; I tried.

Facebook told us the ad library archive pages are intended for the programmatic collection of ad creatives, but sure doesn’t make it easy. Facebook could have provided reliable annotations or identifiers for the image, video, text, link, and other Document Object Model (DOM) elements (these provide the structure of a page) so we could find them that way. But, no, none of these DOM elements have clear labels. I ended up using xpath and class name selectors to navigate the DOM. Or, Facebook could put these assets in a sensible directory structure, and provide those for download as zip archives.

Another problem: the DOM structure and class names can and do change frequently. Facebook does not send any notification when this happens; instead, you have to monitor the collectors very closely in case an unannounced change breaks your collection code. Exceeding one QPS (query per second) for the archive page gets your IP address, a way you can be identified as the source of a query, blocked for four hours. So you need to detect when that happens and slow down. So now you need to use webdriver, your memory requirements have increased by almost one gigabyte per ad collector instance, and in order to keep up with the newly created ads, you have to run and monitor multiple collector instances. All of this requires engineering know how and time, as well as computing resources, which quickly exceeds what most people can put into a “hobby.” But you do all of it because researchers and journalists need this data, the public deserves to know, and neither will have the data unless you put Facebook’s puzzle pieces together.

So, you implement the API collector and creative collectors. Now you want to know what ads users are seeing. But wait, some of these ads are “dynamic creatives,” meaning advertisers provide multiple text(s), image(s)/video(s), and/or link(s), and Facebook decides at serve-time what combination to show users. Also, ads that have multiple creatives can have vastly different images, videos, or texts, but Facebook only reports the impressions and spending at the ad level. So, you have no idea what users are actually seeing! At this time we don’t have a solution for this, and it remains one of the glaring blockers against providing comprehensive transparency.

For example, these creatives from the same ad. One for a “Blue Line Flag Hat,” and the other a “Fantasy Island Mug.” Which was shown to users? We don’t know!

Ad from Facebook Ad Library for “Blue Line Flag” hat
Which of these image and text combinations did Facebook users see? A blue line flag or a Fantasy Island mug? We don’t know!
Ad from Facebook Ad Library for “Fantasy Island” coffee mug

Getting reliable spending data

Let’s turn to the problem of figuring out how much money is spent on ads.

In addition to the API, Facebook publishes transparency reports, which provide info about page’s spending over time and by region. But downloading these reports regularly requires a full browser to execute javascript or extremely reliable interns manually downloading the reports every day (including weekends). We are lucky to have interns who will do this labor. But if you are not as lucky as we are and need to go the browser-executing-javascript route, that requires parsing a confusing DOM structure without clear labels and that can change at any time. Facebook could make these reports simple to download or publish them in something like an s3 bucket, a type of online storage. But Facebook chooses not to do so.

How reliable is data on ad spending in the transparency reports?

The transparency reports have all sorts of other annoying issues. Total spending per advertiser of $99 or less is not included, and spending above $99 is reported as a range, in daily spend reports. Spending in lifelong reports, in other words, how much a particular Facebook page spends over its entire existence, has a lag of at least five days. Facebook told us it takes five days for spending totals to “settle” in their systems. So you never have a definitive number for a page’s spending. Sometimes spending fluctuates between days. I’ve seen reported lifelong spending for a page jump an order of magnitude in one day, and then the next day that spending spike disappears, never to be seen again. Facebook does not regularly regenerate reports even if they contain errors.

We reported this problem to Facebook. But it was a long process. First we had to find the issue, then double-check our work, collect examples of the issue to show overwhelming evidence of a problem, request a meeting, clarify the issue when our first meeting request was politely denied by asking for clarification, and finally report the issue over a video call. All of this requires 1. an established relationship with Facebook, 2.time, energy, and know how to organize the data, and 3. the ability to speak the language that Facebook will listen to and understand.

[T]echnical complexity should not prevent people such as local journalists or concerned members of the public from accessing and making sense of the Facebook ads dataset.

All three of those requirements are privileges that our project has because we are established researchers and Facebook has decided to listen to us. After our meeting, Facebook did regenerate the reports for the specific instances we found on a one-off basis, but did not start regenerating reports regularly, nor does Facebook publish errata when they do regenerate reports. How on earth is someone without our privileges supposed to meaningfully use this data, and report issues? This is not a theoretical issue. In many countries transparency is voluntary and Facebook does not require advertisers to disclose their ads. There is a pattern of lax action against political manipulation in countries that tech companies treat as strategically unimportant. And imagine a strapped newsroom anywhere doing any of this!

You can read more about our issues with spend estimation in Laura Edelson’s blog post, Facebook’s political ad spending numbers don’t add up

Does Facebook actually want to provide data that it thinks will enable accountability and transparency?

Using Facebook’s offerings to make something that researchers can actually use is a full-time job, requires technical know-how, and has a non-trivial cost. Facebook has chosen to allow many barriers to entry here. Given the choice between a simpler option and a more complicated option, Facebook has routinely chosen the more complicated one.

Facebook has also told members of our project that it might deactivate our developer accounts and cut us off from the API. If that happens, other people will have to take on this work, and likely have to duplicate much of the work I’ve done. Anyone who has inherited code they didn’t write, or tried to re-engineer a data pipeline knows this can be extremely difficult and frustrating. This is why I wonder if Facebook leadership actually wants to provide data that will enable accountability and transparency.

The public deserves a universal archive of digital ads– here’s researchers onwhat such an archive should contain–that is accessible regardless of technical know how or researcher prestige, and should allow extensive, holistic research of the digital advertising ecosystem. There are lots of ways to do this, and a future blog post will delve into those details. But, most importantly, technical complexity should not prevent people such as local journalists or concerned members of the public from accessing and making sense of the Facebook ads dataset.

Facebook has massive resources at its command, and readily admits its workers are critical to its success. Every day thousands of Facebook’s rank-and-file workers solve extremely complex problems, build robust systems, and provide seamless experiences for developers and Facebook’s users. But Facebook has not done that for its transparency offerings.

I genuinely feel bad for the rank-and-file Facebook employees tasked with implementing and maintaining these systems, because I know they work very hard to fulfill the mandates that Facebook leadership gives them.

But I don’t think that Facebook leaders truly are trying to build tools for real transparency and accountability. I think that rank-and-file Facebook employees are being given very complex and difficult tasks intended to look like Facebook is working very hard to be transparent, without actually enabling transparency and accountability.

If I were working on these tools and realized they are more for PR than transparency and accountability, I would feel very used and angry. But Facebook’s workers are not powerless here. Just like Facebook coordinates the labor of thousands of workers to achieve its end, so too can Facebook’s workers organize and act collectively to demand real transparency and accountability.

We will continue to write about technical problems with Facebook political ad disclosure. Check out Ad Observatory, a searchable site revealing trends in Facebook political advertising in the 2020 elections, and download the Ad Observer plug-in tool to safely volunteer information for researchers and journalists on what ads you are seeing on Facebook.

--

--

Paul Duke
Online Political Transparency Project

Software engineer. Advocate for worker power & better working conditions for all tech workers. Previously worked at Google. https://twitter.com/nullvoidstar/