Computing incremental sales at Untie Nots

Julien Louis
Untienots
Published in
5 min readOct 3, 2022

Part 2: Common methods used in the retail industry

Looking for the start ? Click here for part 1

In this second part, we will look at methods for computing the incremental sales of a marketing campaign commonly used by big retailers. In general, these methods are intuitive and fast to implement, but come at the cost of strong biases or other drawbacks.

Incremental Sales Direct Computation

The most straightforward way to compute the incremental sales is to compare player sales during promotional campaign to their baseline sales.

There are two main ways of defining the baseline sales: the sales in the few months just before the campaign (normalised for duration); and the sales for the same period 12 months ago.

This approach has a few key advantages, it’s intuitive and easy to implement. Furthermore, there is no need for a control population, everyone can freely participate in the campaign

But there are quite a few drawbacks as past sales are not always a good predictor of future sales:

  • for the baseline just before the campaign: sales are very seasonal both at the retailer and the customer level, this strong variation is often greater than the impact of a marketing campaign.
  • for the year before baseline: the customer’s behaviour can change a lot in a year, for example they might start shopping at a different store too. There will also be a strong bias because of new customers who did not come to the store before.
  • external factors can also have a strong impact on sales, for example the covid lockdowns have completely modified some spending patterns.

We could account for those drawbacks manually but adjusting the baseline is a tedious task and might not always stand up to rigorous standards.

Raw incremental based on the player spending only

A/B testing

The gold standard of scientific experiments is randomised control trials. Before the start of the campaign, we randomly selected a small subset of our customers who won’t be allowed to participate, eg the control group. These two groups are comparable, and the change in their behaviour must be explained only by our intervention — participation in marketing campaign.

We can measure the lift created by the campaign:

This method, however, has several drawbacks. First of all, the necessity to hold back a control group reduces the number of participants in the campaign which naturally reduces revenue.

But more importantly, it is not always possible to design a protocol for selecting a control group. In our case, we build a control group by not sending out the marketing emails to some customers. However, if we compare this control group’s sale to the exposed customers, we will dilute the effects of our campaign because only a fraction of the customers play. We are also unable to create a control group for the players directly because we cannot reject a customer who wants to play.

Since we want to measure the effect of participation and not merely exposure, comparing the control group and the players will introduce a bias. One way to account for this would be to select only the “potential players” among the control group, though it is easier said than done.

Exposed and control group are homogenous
The players are not homogenous to the control group

Comparing the players and the control or non-players directly is wrong, since the populations are not homogenous. We could compare the exposed with the control group anyway, and adjust the incremental value according to the percentage of players. But if it is too low, the increase in sales will be overshadowed by the natural variability in sales.

Using RFM clustering to resample the control group

RFM stands for recency, frequency and monetary value. These features have been proven to be a good predictors of customer behaviour. Customer segmentation by RFM metrics helps to identify groups of similar customers and is often used to personalise targeting. Usually they are grouped according to the quantile they belong to in each metric.

Example of RFM characteristics based on their quartiles

Using these features we could cluster the customers and resample the control group to match players more closely. We can then compute the incremental sales value for each group, which can help us see the effectiveness of our campaign on different populations.

In practice, however, these metrics are usually not enough to clearly separate the control group into potential players and non-players since there are many other factors which contribute to the participation of a specific customer. These factors could be anything from their income bracket to their personal views on data privacy. Furthermore, you still have to use one of the techniques above to actually compute the incremental for each group, with the same advantages and drawbacks.

Next part

We looked at strategies commonly used in the retail industry to measure the incremental sales of a marketing campaign. These are usually easy to implement, but have some strong biases when looking at real world data. In part 3, we will look at solutions we use at Untie Nots that overcome these biases, while being understandable to the layman.

--

--