Accelerating Geo-testing at ASOS

Published in

ASOS Tech Blog

5 min readDec 6, 2021

An important task in marketing science is to determine the incrementality of ad spend — of the customers who were exposed to ads and go on to order, how many extra orders were caused by the ads and how many would have occurred regardless?

We’re interested in measuring the incremental conversions— the additional orders caused by the ads.

The simplest way to measure this is to run an A/B test: randomly assign users either to see ads or not and measure the difference between the two groups over the course of the experiment.

But there are a few reasons why using A/B tests to measure incrementality might not be optimal:

Apple’s recent iOS 14 update — requiring users to opt in to tracking rather than opt out — has called into question the ability of ad publishers to segregate control groups when running lift tests. This can lead to confounded experiments and untrustworthy incrementality numbers.
It’s not possible to test offline advertising such as billboards or TV with an A/B test as there’s no way of splitting users between test and control groups (unless you try blindfolding every second person that walks past your billboard).
More complex experiment designs involving multiple marketing channels would require cross channel coordination to ensure a common control group is maintained. This can be difficult in practice.

For these reasons, an alternative to running a randomised control trial like an A/B test is to use a geo-test.

In a geo-test, a market is divided into smaller geographical regions called geos with each geo assigned to either the treatment or control group to maximise the comparability of the groups. Users in the treatment geos are exposed to ads while users in the control geos are not.

Geographical regions form the experimental units of a geo-test.

Geographic regions are often heterogeneous so can’t be compared directly. Instead a statistical model is used to create a synthetic control for the treatment group using the control group as the primary covariate (see [1], [2], [3], [4] for further background).

The model is trained on a pre-test training period prior to the treatment being introduced when the two groups are in their baseline states. The trained model is then used to predict how the treatment group would have behaved in the test period in the absence of a treatment. This is called the counterfactual. The incremental effect of the treatment then is the difference between the observed performance in the treatment group and the counterfactual.

The model learns the baseline behaviour of the treatment group relative to the control in the training period and is used to predict the counterfactual during the test period

Challenges in geo-testing

Geo-tests can solve many of the issues associated with A/B testing for ad incrementality but they come with their own challenges:

They’re time consuming. As well as a test period where the treatment is applied, a pre-test training period is required to fit the model.
Regions are “lumpy”. It’s not straightforward to find control regions which act as good approximators of treatment regions. Poorly matched regions may need longer experiments to achieve sufficient test power.
Regional boundaries are porous. Users can move between control and test regions during the experiment e.g. commuting to and from work. This can lead to users from the control regions being exposed to ads.

In this post we focus on two steps we’ve taken at ASOS to reduce the time required to run geo-tests.

1. Reusing training data

One way to reduce the time needed between geo-tests is to use data from before and after the experiment to fit the model. Using data from after the experiment gives us a second read on the baseline behaviour of the two groups and changes our predictions from extrapolations to interpolations.

After removing the treatment, there is another opportunity to for the model to learn the baseline behaviour of the treatment group relative to the control.

Designing experiments in this way means that each baseline period is used twice — once as pre-experiment data and once as post-experiment data — which halves the time needed between successive experiments.

It also provides better predictions. We explored the effect of incorporating post-experiment training data by performing A/A tests over a large number of markets and time periods and observed a 20% reduction in Mean Absolute Percentage Error (MAPE) even when controlling for the total number of points in the training data.

2. Multi-cell tests unlock additional testing capacity

Another way to reduce the time required to run geo-experiments is to run multi-cell tests — applying two or more treatments to different geographic regions which share a common control group.

There are several ways in which multi-cell tests can add value:

Testing independent hypotheses concurrently. Using multi-cell experiments to test different hypotheses in parallel means we can iterate more quickly than testing them sequentially.
Determining the optimal spend level. We can optimise our media spend by comparing several different levels and observing which delivers the best return on investment.
Interaction effects between channels. We may be interested in assessing the cannibalisation of sales as a result of running both Google and Facebook ads. E.g. we could run a multi-cell experiment with four groups to test this: no ads, just Google ads, just Facebook ads, and both Google and Facebook ads.

We enabled this capability at ASOS by modifying our geo-assignment process to allow us to optimise multiple treatment groups simultaneously.

For single cell experiments, we implemented the hill climbing algorithm in [5] which greedily searches for the optimal assignment of geos by sequentially adding regions to the treatment group and then optimising the control group for the given proposed treatment group. We extended this to the multi-cell case by adjusting the geo addition step.

Rather than adding a geo to a single treatment group, we loop over the treatment groups and add the most appropriate geo to each. Finally we construct our control group such that it is jointly correlated with each of the treatment groups. This process is then repeated until no more geos are available.

Running multi-cell experiments in this way has allowed us to significantly expand our testing capacity at ASOS.

Increasing importance of geo-tests

As the industry shifts to a more privacy-centric model for online advertising (e.g. with Google’s decision to deprecate cookies in 2023), privacy-preserving experiments like geo-tests are taking on a growing importance in the measurement of marketing incrementality. While they can be complex to design and time-consuming to run, incorporating post-trial training data and running multi-cell tests are ways to reduce these costs.

We’re hiring! Head here to view and apply to our open roles.

Conor McCabe is a Machine Learning Scientist at ASOS. In his spare time he likes running and listening to history podcasts.

Accelerating Geo-testing at ASOS

Challenges in geo-testing

1. Reusing training data

2. Multi-cell tests unlock additional testing capacity

Increasing importance of geo-tests

Written by Conor Mc Cabe