By Sinan Aral
The effectiveness of digital ads is wildly oversold. A large-scale study of ads on eBay found that brand search ad effectiveness was overestimated by up to 4,100%. A similar analysis of Facebook ads threw up a number of 4,000%. For all the data we have, it seems like companies still don’t have an answer to a question first posed by the famous 19th century retailer John Wanamaker: Which half of my company’s advertising budget is wasted?
It should be possible to answer this question, though. Because what’s getting in the way is not a lack of information — the problem Wanamaker faced — but rather a fundamental confusion between correlation and causation.
The Conversion Fallacy
When marketing reps sell ad space to clients, they claim that ads will create or cause behavioral change — a phenomenon typically called lift. They back up the claim by pointing to the number of people who purchase a product after seeing the ad — typically referred to as the conversion rate.
To explain the difference between the two to my students, I have them imagine that, on the first day of class, I stood at the door handing out leaflets advertising the class to every student who walked in. I then ask them: “What’s the conversion rate on my ads?” They always correctly reply “100%” because 100% of the people who saw the ad “bought” or enrolled in the class. Then I ask: “How much did those ads change your behavior?” Since they had all already signed up for the class long before seeing the ad, they all reply, “Not at all.” So, while the conversion rate on my ad is 100%, the lift from the ad — the amount of behavior change it provokes — is zero.
Although my example is a bit simplistic, it shows why the confusion of lift and conversion can create problems for measuring marketing ROI. Big brands pay consultants big bucks to “target” their ads at the people most likely to buy their products.
But unless the targeting is directed at customers who aren’t already prepped to buy the products, the conversion from click to cash will not generate any new revenue.
The key to making advertising pay is getting people to buy your goods (or donate to a political campaign or take a vaccine) who would not otherwise have done so.
Let’s say we want to know whether (A) joining the military (B) causes a person’s lifetime wage earnings to be lower. We can’t simply compare the wages of people that enter the military to those that don’t, because there are many other factors © that could be driving differences we might see in the raw numbers.
For instance, people with access to better-paying jobs are less likely to join the military in the first place (this is B causing A). And people with more education or skills choose not to enter the military (C causing both A and B). So what looks at first like a causal relationship between military service and lower average wages might simply be a correlation induced by these other factors. The challenge, then, is to control for these other factors while isolating the relationship we want to examine.
We can do this by creating a control group. If we randomly assign some people to join the military, the group that joins (the treatment group) will have, on average, the same education and skills (and age, gender, temperament, attitudes, and so on) as the group that doesn’t join (the control group). With a large enough sample, the distributions of all observable and unobservable characteristics across people assigned to treatment and control groups are the same, making the treatment itself the only remaining explanation for any differences in outcomes across the two groups. With all else equal, we can be confident that nothing other than their military service can drive differences in their wages.
The trouble we can’t always do this. A scientist would be hard pressed to justify a study that randomly forced people into the military. In these cases, we look for what are known as “natural experiments” — natural sources of random variation that mimic a randomized experiment.
A good natural experiment used by Josh Angrist to measure the effect of military service on wages is the draft lottery imposed on U.S. citizens during the Vietnam War. Every male citizen was assigned a draft lottery number and these numbers were chosen at random to determine who was drafted. The draft lottery was a natural experiment that created random variation in people’s likelihood to join the military. Angrist used this variation to estimate the causal effect of military service on wages.
In a similar way, Christos Nicolaides and I used the weather as a natural experiment to understand the effect of social media messaging on exercise behavior. Though people who run more tend to have friends who run more, variation in the weather helped us estimate the degree to which receiving social messages from our friends cause us to run more.
When you dig into the data and start running experiments, you quickly learn that effects of online ads are not what you might expect. In the Yahoo! study, for example researchers found that online display ads did indeed profitably increase purchases by 5%. But almost none of that increase came from loyal, repeat customers: 78% came from people who had never clicked on an ad before and 93% of the actual sales occurred later, in the retailer’s brick-and-mortar stores, rather than through direct responses online.
In other words, the standard model of online ad causality — that viewing translates into click, which then leads to purchase — does not accurately describe how ads affect what consumers do.
The Benefits of Causal Marketing
Findings like that may explain why Procter & Gamble and Unilever, the granddaddies of brand marketing, were able to improve their digital marketing performance even as they slashed their digital advertising budgets. In 2017, Marc Pritchard, P&G’s Chief Brand Officer, cut the company’s digital advertising budget by $200 million or 6%. In 2018, Unilever went even further, cutting its digital advertising by nearly 30%. The result? A 7.5% increase in organic sales growth for P&G in 2019 and a 3.8% gain for Unilever.
The improvements were made possible because both companies also shifted their media spend from a previous narrow focus on frequency — measured in clicks or views — to one focused on reach, the number of consumers they touched. Data had shown that they were previously hitting some of their customers with social media ads ten to twenty times a month. This level of bombardment resulted in diminishing returns, and probably even annoyed some loyal customers. So they reduced their frequency by 10% and shifted those ad dollars to reach new and infrequent customers who were not seeing ads.
They also looked very closely at first-time buyers to understand purchase motivations, enabling them to identify, quite precisely, promising groups of under-touched customers. For example, they described in their 2019 fourth quarter earnings call that they were moving from “generic demographic targets like ‘women 18–49’” to “smart audiences” like first-time moms and first-time washing machine owners.
The tidal wave of granular, individual level, personal data created by online advertising has given us the answer to the question John Wanamaker posed. It can potentially allow marketers to measure media effects precisely and to know which messages work and which don’t. Just be sure you’re distinguishing correlation from causation, as P&G and Unilever did, and not targeting people who are already your most loyal customers.
Sinan Aral is the Director of the MIT Initiative on the Digital Economy and author of The Hype Machine: How Social Media Disrupts our Elections, our Economy and our Health — and How We Must Adapt.
Originally published at https://hbr.org on February 19, 2021.