The good and the bad of A/B testing for creative optimization

Brad Deutsch
Known.is
Published in
5 min readAug 8, 2022

We know that testing is critical for distinguishing between “good” and “great” creative. Traditional A/B testing has been used for decades to do just that — but it’s far too slow to be used as the primary testing program for any campaign. For that we need to turn to game theory combined with a smart approach to client goals.

Marketers know that testing creative is critical to campaign success. We’ve written previously about the strengths and weaknesses of survey-based testing, and how any robust testing program should also include in-flight testing.

A popular tool for live creative testing is the so-called A/B test, in which two options are fielded for a fixed time period, after which the better performing option is chosen, and the worse one discarded. This methodology is an old standard at every tech company as well as the larger scientific community. A/B testing gives straightforward results and a measure of statistical certainty. They also provide other benefits to a marketing agency:

  1. They allow us to test specific hypotheses. Our clients often have more complex campaign goals than simply “maximize conversions.” They want to learn specific things about their customers, and then apply that knowledge in future campaigns or even outside the marketing department, for example to design new product features. An A/B test allows for a formal hypothesis test with a clear result.
  2. They give us control over the reporting & insight cycle. In A/B testing you decide ahead of time what level of sensitivity and statistical certainty you need, and then plan the duration of the experiment accordingly. That can be useful in roadmapping for complicated, multi-channel campaigns. Just be aware that it’s possible for any A/B test to yield insignificant results!
  3. They provide a bridge to more complex experimental design. For example, factorial designs allow for efficient (read: minimum spend) extraction of high-level insights very efficiently. Such designs are built on the backbone of simple hypothesis testing.

While A/B tests provide clear benefits, we see in practice that they tend to miss out somewhat on performance for two main reasons.

First, we can’t test fast enough. Formally, the only result we get from an A/B test is whether market audience A or B did significantly better than the other, which implies that we’d need impossibly many tests to find the best of 100 options. But even if we design our tests to produce a quantitative read on performance for 100 options and tested in pairs, we would still need 50 tests to find the best one. This isn’t a total deal-breaker since we can run tests in parallel, but it’s not the most efficient solution.

Second, we need to act as soon as we know anything. When we design an A/B test, we define the experimental parameters ahead of time and commit to a test duration that is expected to reach a level of statistical significance. If one creative option proves to be significantly better than the other, we might respond by putting more budget behind it. The idea of iterative learning through experimentation is a perfect fit for many scientific communities, where small mistakes are costly and knowledge is generated over long time scales, but we ideally want to make decisions much faster than that.

The key is recognizing that even before a test is done we have some information about which options are performing better, and we need to shift our budgets accordingly. A/B testing protocol doesn’t let us do that, so we need another tool.

Figure 1 represents continuous campaign optimization outperforming traditional A/B testing.

Fig. 1: Simulation comparing continuous optimization and traditional A/B testing. Two creatives are compared with 10% and 30% unknown conversion rates, respectively. The blue line shows the result of a campaign where the first half is devoted to an A/B test with 50% allocation to each creative. After the test the allocation is split between the creatives proportional to measured performance. In the continuous optimization case, a “best guess” at performance is generated at every step, and the budget is reallocated according to the estimated performance ratio. In the end, continuous optimization outperforms the A/B test strategy by about 11%.

Multi-armed bandits: A partial solution

Fortunately, game theory gives us a partial solution to this problem. Multi-armed-bandit (MAB) theory provides a strategy for testing as many options as needed at the same time, and adjusting how much we invest in each one based on performance. Even if we limit ourselves to testing two creatives, MAB can generate significantly better results, since we’re continuously investing budget according to our best understanding of relative performance.

How much of a difference does it make?

Fig. 1 shows the results of a simulation in which two creatives with unknown “true” conversion performance are tested head-to-head with both A/B and MAB testing methodologies. In this example, assuming 10% and 30% true conversion rates and half of the campaign devoted to a testing period, the end result is about 11% better for continuous optimization. This difference tends to be greater for longer market testing periods and larger performance differences.

Limitations of MAB

Even multi-armed-bandit methodologies can’t handle all of the realities of creative testing at a modern marketing agency. Here are some limitations of MAB:

  1. Performance changes over time. If one of our creative options includes a Christmas theme, it won’t do as well after Christmas. We need to be mindful of trends that change over time so we’re always making the right inferences about future performance based on past performance.
  2. The number of assets produced is limited. Small changes in font or CTA placement can produce many creative options, but anything more substantial requires resources from our talented creative team. Practically, that means we need to prioritize the best options.
  3. Campaigns have time limits. The more creative you test, the longer it will take to build a solid idea of which options work best. This is especially critical when we want to include multiple rounds of creative production based on live testing results.
  4. Not all creative is created equal. By the time we’re running a live campaign, extensive research and workshopping has gone into the creative. While even the best marketers need testing to find the best creative, we usually have a good idea of what creative “territories” will perform best, and generating creative far outside those territories is unlikely to be successful.
  5. Immediate KPIs don’t fully capture the idea of success. We use game theory to optimize our KPI’s over the span of a campaign, but the success of a marketing campaign is multi-faceted. What did we learn about our audience? How did the campaign affect long-term brand metrics? Were we able to deliver insights in time to be useful for related campaigns?

A/B and MAB testing at Known

Advanced experimental design and game theory are a huge improvement over traditional A/B testing for creative market testing. At Known, we find that knowing where and how to apply techniques like these requires the combined powers of our buyer-scientists, channel specialists, media strategists, and an award-winning creative team.

Read more: Getting the most out of big platform optimization

Want to learn more about creative testing at Known? Get in touch!

--

--