Why Most Creative Testing is Bogus and What You Can Do to Fix It

Brian Brydon
Comms Planning
Published in
11 min readSep 15, 2016

--

Take a minute to think of all the time and energy you’ve spent on creative testing; sifting through hundreds of slides of brand effect studies, debating the merits of ad recall, writing POVs on the findings that your campaign drove awareness but not relevance… Now consider this gem of a finding from Binet and Field’s 2008 IPA paper, Marketing in the Era of Accountability:

Intermediate effects of advertising such as brand awareness, image, and other measures of consumer brand health, do not correlate reliably with business performance.

Do you feel a little sad right now?

Now consider that this same paper found the most popular primary measures of communication effectiveness used by marketers — awareness (61%) and image (55%) — have the lowest correlation with business effects. As my fellow planner Yin pointed out in a recent blog post, emotional measures like BrainJucier’s emotion-into-action score show stronger correlation with large business effects but marketers have been slow to adopt them as primary measures of communication effectiveness.

This isn’t new news. And you don’t have to read Binet and Field to see that creative testing has it’s problems. Run two identical creative effectiveness studies simultaneously and compare the results to see just how inconsistent they can be.

What is news is the fact that we continue to be misled by these distracting “metrics of success” 9 years after Marketing in the Age of Accountability was published. In doing so, we end up ignoring the only critical measure of advertising’s effect — business growth; fewer than 20% of marketers measure the profitability of their communications and 24% use intermediate metrics (i.e., communications objectives) as their primary measures of business effectiveness.

HOW ADS WORK VS. IF ADS WORK

When a question is really hard to answer, sometimes we answer a different question instead because it’s easier.

It turns out that it’s pretty damn hard to show if an ad drove business growth. As Binet & Field illustrated, the real business effects of advertising can last several years after the campaign ends and can manifest across several different dimensions of business growth including market share, price elasticity, and penetration. Few planners have access to that level of business information on a regular basis.

It’s a lot easier to show how your advertising works, like how many people remember seeing your ad. Negotiate a brand lift study with your media buy, check the awareness box in the KPI section and violà, you just covered your ass. You’re measuring success.

Sort of…

Binet and Field refer to these creative effectiveness metrics as intermediate measures of effectiveness; they let us know if our communication had the intermediate effects we were aiming for, but they can’t reliably tell us if it achieved the ultimate goal of advertising — business growth. And they probably never will, as Alan Hedges suggested in his 1974 paper Testing to Destruction, because advertising works in many different, complex ways.

Any campaign worth measuring should be guided first and foremost by these hard business objectives. Research shows that when we do this and set clear business objectives, our campaigns are significantly more effective.

What’s more, if we use multiple objectives and prioritize their importance, we see even bigger business results.

INTERMEDIATE METRICS: ALWAYS THE BRIDESMAID, NEVER THE BRIDE

Let’s bring this back to the problem at hand: those mysterious intermediate measures and what to do about them.

A key word in the quote at the top of this article about immediate measures is “reliable.” Binet and Field did find a relationship between intermediate metrics like awareness, and business output like market share, but the correlation was so low that they and their peers couldn’t confidently recommend intermediate effects as a primary measure of business growth.

As secondary and diagnostic measures, they can provide critically helpful proof points around how your advertising worked, allowing you to make informed messaging, creative, and media optimizations. And taken with a grain of salt, they can give you signals around the potential business value of your campaign.

To understand which of these measures you should use to evaluate how your campaign is working, you first need to know what you want your ads to do; what attitudinal or behavioral change are you aiming to create. This can take quite a bit of time and debate to land on, and it depends almost wholly on the unique circumstances of your brand and campaign.

To make things easy on us, smart people like Mike Hall of Hall & Partners have taken the effort to categorize the different models of communication objectives used by advertisers with the goal of codifying and comparing the different ways in which advertising works. Binet & Field offer similar models.

Through the IPA’s databank, we can see that not all of these communication objective models have the same level of business productivity. Campaigns designed to drive fame deliver a much higher level of business results than campaigns that use more classic models, such as awareness and image. The data also shows that it’s a lot easier to deliver against some of these classic measures than it is the lesser used measures, often because of a fundamental error in the way data for those measures is interpreted. For example, users of a brand are much more likely to be aware of it than non-users, making it easy to find and report on the awareness for bigger brands. That’s why marketers lean on awareness so often; it’s pretty easy to show.

EVOLVING TOWARDS THE DUAL PROCESS SCHOOL OF THOUGHT

When you think about it, these 8 communication models have built-in, fundamental assumptions about how advertising works. For example, when you prioritize differentiation as a key objective to protect market share, you believe that your market share is threatened mainly because people aren’t aware of your brand’s unique benefit. You believe that education is a key to your communications strategy, and therefore, business growth.

If you take a broader look at underlying assumptions of these 8 measures and how they relate and differ, two bigger theories of how advertising works start to emerge.

One, we’ll call AIDA (awareness, interest, desire, action) is based on a traditional persuasion model where people need to pay attention to an ad, be persuaded by it’s message, change their beliefs about the brand, and then purchase the product. Measures such as awareness, image, and differentiation are key tenets of this school of thought.

The second, we’ll call Dual Process (System 1 / System 2) is based on a more recent psychological theory outlined in Daniel Kahneman’s Thinking Fast & Slow, which holds that a majority of our decisions are made by an almost automatic and irrational System 1 part of our brain. Fame (or salience) and emotion are the key measures of this school of thought.

For the AIDA school of thought, the objective of increasing differentiation to protect market share makes perfect sense because persuasion is a critical step to purchase. But for the Dual Process school of thought, differentiation doesn’t make as much sense because it recognizes that consumer behavior will be influenced mostly by the availability of that brand in people’s memory, not it’s rational benefits.

When choosing your campaign objectives, it’s critical to first ask which of these schools of thought you belong to.

Although they don’t come out and say it, Binet and Field give substantial proof that a Dual Process-based philosophy tends to drive larger business results than the AIDA-based philosophy. In his landmark book How Brands Grow, Byron Sharp also provided a robust argument for moving towards a more salience-based model of communication strategy and away from an awareness / persuasion based model.

Excerpt from How Brands Grow, Byron Sharp

AN INDUSTRY IN TRANSITION

This new wave of thinking about how advertising works has triggered significant change in two areas of the creative testing industry which should be known by anyone planning creative effectiveness tests.

  1. More emphasis on emotional and salience-based measures.
    Up until about 20 years ago, almost all of our effectiveness surveys were built on traditional measures of success that reflected the AIDA school of thought — ad recall, awareness, purchase intent, etc. Papers published by Binet, Field, Kahneman, and Sharp left a void in the measurement industry for capabilities that reflected this new school of thought but the industry soon started filling the void. Measurement company BrainJuicer introduced a new “emotion-into-action” score in the early ‘10s, which largely ignored traditional measures in favor of emotion and salience. And existing companies like Nielsen and Millward Brown made their own claims on Dual Process, with new measures around emotion, salience and fame. Almost all major research vendors still offer to measure AIDA-based metrics because it’s easy for them to do so, but they’re slowly moving to this newer model of measurement.
  2. More emphasis on measuring non cognitive effects.
    The breakthrough thesis in Daniel Kahneman’s Thinking Fast & Slow is that a majority of our decision making processes are borderline subconscious and controlled by our emotion-driven System 1 brain. If you believe this theory, then you should have serious doubts about the validity any type of measurement that requires a person to think about how advertising has affected them. Almost all major research vendors have responded to this landmark theory by introducing new methodologies that bypass problems linked to self reporting, like cognitive bias, where a subject gives the response they think the researcher wants to hear instead of the response that’s actually true. But these new methods come with their own set of advantages and disadvantages that must be aired.

TODAY’S CREATIVE TESTING TOOLBOX

There are thousands of ways creative testing can be categorized — pre vs. post testing, survey vs. laboratory, concept vs. copy, digital vs. television, etc. But the most important characteristic to understand about any methodology is what school of thought it reflects — Dual System or AIDA.

When we look at the universe of measurement tools through this lens, we can see three main types of creative testing, each relying on a different mixture of conscious and subconscious measures.

Self Reported methods ask the subject to think about the impact an ad had on them and respond with written or verbal answers. This is the most typical form of creative testing and it was pioneered in the 1950s by the infamous pollster George Gallup when attention was considered the ultimate goal of communication. Self-reported studies can further be broken into three types:

  • Verbal self reports: questionnaires and surveys that respondents complete with words and numerical ratings.
  • Visual self reports: questionnaires and surveys that ask respondents to describe their response by picking an image that represents some kind of mental affect (i.e., type of emotion).
  • Dial testing: subjects use a dial to rank their reaction to an ad on a positive-negative scale.

The key advantages to using self-reported measures is that they are quick, easy, and cheap to administer, and they are widely accepted, allowing brands to easily compare results between campaigns. The big disadvantage is that they don’t account for non cognitive System 1 effects and they introduce the element of cognitive bias, which can undermine the validity of any finding.

Autonomic methods involve measuring the semi-conscious, physical effects of advertising such as heart rate and pupil size. These methods are relatively new to the industry and require more advanced technology and expertise to administer than most self-reported methods. There are four main types of automatic measurement available today:

  • Face tracking: a subject’s emotional response to an ad is inferred from a scanner that records facial expressions.
  • Heart rate: a subject’s heart rate is monitored to determine the level of arousal generated by an ad.
  • Eye tracking: a subject’s eye movements and pupil diameter are tracked to determine a level of emotional valence.
  • Skin conductance: the level of skin conductance, which is influenced by the amount we sweat, is measured to determine the level of arousal generated by an ad.

Autonomic measures tend to be more expensive and invasive than self-reported measures, but they allow for greater understanding of the System 1 effects of advertising because they remove the element of cognitive bias.

Neurological methods involve measuring brain activity generated by an ad. These methods tend to be very expensive and complex because they require advanced technology to execute, like MRI or EEG scanners. They’re also very difficult to interpret, because our understanding of how the brain works is still very limited. There two main types of neurological measurement available to us today:

  • Electroencephalography (EEG): surface-level brainwaves are measured by electrodes placed on the subject’s head to determine the type and location of certain brainwaves.
  • Brain Imaging (fMRI, PET, MEG): A full map of brain activity is created using advanced magnetic or radioactive scanners.

While they might be exciting and futuristic, neurological measures are the most experimental and least reliable of primary testing methods. Contemporary neurological technology only allows us to see a sliver of brain activity of which we still know very little about, making it very difficult to infer communication effects. They also tend to be extremely invasive (i.e., putting brain scanners on people’s heads) which is never good when you’re trying to recreate a real-world testing environment.

PICKING THE LESSER OF MANY EVILS

Alan Hedges’ 40-year old proclamation about creative testing still holds true today:

We do not know in any specific sense how advertising works [and] we are not likely to ever come up with a simple yet useful formula [because] different campaigns work in different ways.

However, we do know which measures and methodologies best reflect our beliefs around how advertising works, and using those, we can create measurement plans that actually reflect the purpose of our campaigns and give us feedback around how they are fulfilling our hypotheses.

Assuming you subscribe to the Dual Process school of thought, a flexible model combining self-reported and autonomic measures is starting to emerge as viable starting point for most marketers today because of it’s ability to measure System 1 and 2 effects. A variation of this model was outlined in Millward Brown’s recent announcement of a new Dual Process-based version of their Link report:

Millward Brown isn’t the only vendor to offer this combination of services. Nielsen, BrainJuicer, and Hall & Partners offer similar methods of measurement that can be combined to account for both Systems.

Ultimately, the specific KPIs you choose should be dependent on the specifics of your campaign and what you believe it should do. But this model provides a well-supported framework to start from.

References:

Binet and Field (2008) Marketing in the Era of Accountability: Identifying the marketing practices and metrics that truly predict profitability

Hall (1992) Using Advertising Frameworks: Different research models for different campaigns

Hedges (1974) Testing to Destruction: A critical look at the uses of research in advertising

Micu and Plummer (2010) Measurable Emotions: How Television Ads Really Work

Sharpe (2010) How Brands Grow: What marketers don’t know

--

--