Behind the Scenes of Ads and Fake News

Adam Berke
9 min readApr 6, 2017

--

It’s hard to fathom now given the prominence of the topic, but the term “fake news” didn’t even exist until the waning moments of 2016. That’s just a few months ago at the time of writing. However, in that small window, it has jumped to the forefront of our vocabulary and is now being used to describe everything from the most vile hate speech to labeling opinions one doesn’t happen to agree with.

Given the former category (hate speech and the like) there has become an increasing concern about the business model behind websites where this content appears, and in particular, their sources of advertising income. Having spent over 15 years in digital marketing, I’ve been contacted with increasing frequency to help explain what’s going on behind the scenes. In particular, there’s been a growing interest in how mainstream brands and well intending companies are sometimes wrapped up in the situation by on occasion appearing alongside questionable content.

To summarize many of these conversations, I decided to write up a basic explanation of how programmatic advertising works and the complexities of the “fake news” issue for online advertisers and publishers. This will be an oversimplified version for industry insiders, but will help people who haven’t spent a lot of time in the space understand how these sites monetize, the complexities of filtering content, and what’s being done to address it.

A quick primer on programmatic advertising

First off, it’s important to remember that advertising provides the lifeblood for the type of high quality journalism and free online content that people have come to love and expect on the internet, and it’s in our best interest to sustain that revenue stream. While lower quality sites are abusing some of the systems that reputable publishers rely on, it’s important to take a surgical approach to assessing the appropriateness of ad inventory so we don’t throw the baby out with the bathwater.

Collectively, the type of automated advertising that I’ll describe here is generally called “programmatic advertising” and over the last few years, it has become the dominant method by which marketers run their ad campaigns online and by which online publishers earn revenue. It has democratized online advertising by providing the automation and control even very small businesses need to run highly targeted ads, and it’s provided a new revenue stream for media companies and publishers who have been attempting to reinvent their business models as print has fallen out of favor with consumers.

The primary technology that has enabled this is called real-time bidding (RTB.) In the early days of the internet, sites would only sell their ad inventory directly to advertisers, usually for a set period of time at a set price. That process was manual and inefficient, so new technologies sprang up to improve upon the status quo. With RTB, instead of buying inventory up-front for a period of time, each impression is auctioned off in “real-time.”

Online publishers don’t generally have the tools or scale (with a few exceptions like Facebook and Google) to build this type of real-time technology themselves, so they rely on 3rd parties. These 3rd parties are called supply-side platforms (SSPs) or ad exchanges. There are a number of companies that offer this technology including Appnexus, Rubicon Project, Pubmatic, OpenX, and others. Also, Facebook and Google have tools to help publishers sell their inventory. Facebook has a product called Facebook Audience Network (FAN) and Google has a number of tools for publishers, but the primary ones are called AdSense and the DoubleClick Ad Exchange. For the sake of simplicity, I’ll just refer to Google’s ad inventory (the ad space they sell) collectively as AdX since it includes Ad Sense.

In order to buy ads on these exchanges, you need to be able to respond to a “bid request” which is basically the SSP or ad exchange auctioning off a single ad impression on a particular publisher’s website. These exchanges process billions of bid quests per day. In each bid request, the SSP/exchange includes some information about the ad space being auctioned off, in particular a digital ID (stored in a cookie on the browser or a mobile device ID) a publisher ID (to denote that website serving the ad) and sometimes additional information. In order to bid, an advertiser needs to respond to a bid request within 50 milliseconds with a bid and the creative they want to show.

It obviously requires sophisticated technology to respond to billions of bid requests per day from all around the world, so advertisers and media agencies use other 3rd party technologies, collectively known as Demand Side Platforms (DSPs) to participate in these auctions on their behalf and help them execute their campaigns.

Increasingly, the primary information that advertisers use when targeting their ads is the digital ID in the bid request. To make it tangible, the actual ID advertisers use to target my iPhone is C22724B2–97B3–4D89-B405–0C2BDE08342D. Every person using apps on their phone or viewing webpages in a browser has a similar ID. Targeting digital IDs in this manner is generally called “audience targeting” or “behavioral targeting” since you’re targeting browsers based on some data you have on that ID and less on the site in which the impression is served. This strategy is very effective from a ROI perspective and also allows long tail sites to receive well targeted ads which helps them generate revenue. Most of these niche sites are high quality and appeal to the various hobbies, interests, sub-cultures, and tastes of people around the world. Unfortunately, some of the sites that pedal in false stories and other forms of pernicious content also leverage these ad exchanges and SSPs to generate revenue on their sites, often by obfuscating their true identity, but I’ll get into that later.

So there you have it, the full chain for how an impression is sold basically looks like this:

Since we’re talking about the business model here, you can basically reverse the process when thinking about how money flows. The advertiser puts their budget into a DSP which in turn spends that money on an SSP/exchange and then the SSP/exchange pays the publishers based on the ad inventory they sold.

So now that we have a basic understanding of how things work behind the scenes, now let’s answer some of the most frequently asked questions about ads and fake news:

Why do you have a partnership with a sites like this?

This is often posed when people look at the AdChoices link in the corner of an ad and assume the company displayed is paying the site for the ad space. However, that’s not exactly how it works. The company whose AdChoices message you see is not the company that has a relationship with the publisher. The AdChoices message shows the company that bought the ad, not the company that sold the ad. The company that sells the ad is the one that pays the publisher and would have vet them when they signed up for their service. The company that buys the ad has limited ability to vet the content in real time (remember, just 50ms to respond) and gets limited information on the bid request. The DSP sometimes get the publisher’s name, but not always. Publishers have the ability to sell inventory anonymously. This ability to anonymize your identity as a publisher was originally meant to protect premium publishers who have direct ad sales teams and didn’t want advertisers simply going around them and buying their inventory on the exchanges for a discounted rate. So historically, you wouldn’t want to exclude anonymous publishers because they are also the most reputable ones that also have direct ad sales teams. However, less reputable publishers are now using this functionality to avoid black-listing and making it more difficult to weed them out. There are a range of other technical measures these sites also employ to hide their true identity, such as serving ads through iFrames and other misdirection.

Why can’t you ban this list of sites that I put together?

Subjectivity becomes an important point when looking across millions of sites. Sometimes “fake news” is obvious, sometimes it’s more subjective. The term “fake news” is now being used to refer to many different types of content, including the most offensive hate speech but often opinions that the reader doesn’t happen to agree with. There is no current standard for defining “fake news” so it’s impossible to take any one person’s list or even one organization’s list. Facebook and Google are both working on tools to flag questionable content, but all of these initiatives are still early in testing and haven’t been pushed out at scale. In the meantime, to meet customer demand, we manually black-list hundreds of sites that sell their ad space through Google and other supply sources. This is based on a human review that captures our best understanding of our customer’s criteria for quality

Subjectivity aside, this also ends up being a game of whack-a-mole. While we can manually black-list nefarious sites as we find them, hundreds of new sites (and new user generated content) are popping up all the time. One trend that we’ve noticed is that well intentioned people on the lookout for these types of issues are able to continually refresh offensive sites and will eventually find an ad campaign that slips through due to the complex way audience targeting works as described above. Based on our aggregate data however, the placement of ads on these low quality sites is fairly rare, but it does happen, and it is definitely possible to engineer an unfortunate ad placement if it’s something you’re actively trying to do.

Why not just ban any content that contains racial slurs or a certain set of words?

The DSP can’t “read” the page in real time when the page loads. Remember, they just have 50ms to respond to a bid request and rely heavily on the SSP to vet publisher content since the publisher would need to go through a sign up process with the SSP or ad exchange whether that is Google, Facebook, or one of the other players.

SSPs might be able to implement such a strategy, but even then, having a static list of words without context won’t solve the problem. It can be pretty easy for a human to spot egregious examples, but SSPs and exchanges are managing inventory across hundreds of thousands, and even millions of websites and there are examples where these words appearing might be ok (ie song lyrics, historical analysis, a quote that the author is actually highlighting as something terrible, etc) or be entirely undetectable, (ie within a video, audio clip, or other form of embedded media.) There is around 400 hours of content uploaded to Youtube every MINUTE so you can quickly see the problem of people manually reviewing videos at that scale.

User generated content in general has long been a topic that the ad industry has worked hard to navigate. From videos, to blogs, to comments, users have the ability to change a benign website into one containing undesirable content in a fraction of a second, this creates an additional layer of complexity for managing hateful content at scale.

Looking forward, there are “pre-bid” ad quality solutions that are gaining some popularity, but these would rely on the publisher or SSP allowing them, which isn’t universally the case. Even when they do, it’s impossible to pass all of the possible data to the ad quality provider to make a perfect determination. That’s again ignoring the issue of subjectivity around content where some brands might be ok with certain content, while others won’t. There are also solutions that use natural language processing (NLP) and AI to determine context and meaning on a page, but those are still fairly early in their development and roll out.

Why haven’t you fixed this already?

The term “fake news” really only sprung up after the election and didn’t become a household topic until December or January. Before that, the term didn’t even exist. In addition to the technical complexities described above, companies and industry bodies are grappling with the right way to tackle a sensitive topic. Everyone wants to do the “right” thing, but it’s tricky to develop a set of guidelines that are black and white and unbiased. So we’re also trying to avoid knee jerk reactions that might unfairly target one set of opinions or another.

The good news is that there is a lot of motivation to address this issue and a lot of smart people working on it. Advertising is still the lifeblood of free content online and everyone from brands, to ad tech companies, to the quality publishers that rely on it for their livelihood have an incentive to reign in abuses and keep quality high.

--

--