When A/B Testing Doesn’t Work…

Shruti Misra
2 min readOct 10, 2023

--

As I dove into understanding A/B testing to determine the impact of a product or business change, I always had one question in mind:

How to test for effects when A/B testing is infeasible? This may happen when outside events impact user behavior (like a pandemic..) or in cases where there might not be a way of reliably assigning treatment and control groups.

The answer to this question was discussed in an article by Spotify R&D, which discusses using the difference-in-difference (DID) technique for significance testing in cases where A/B testing doesn’t work, specifically for time series data.

DID techniques are commonly used in economics, so it was pretty cool to see it be adapted in this case. The question DID asks is “what would have happened in the absence of the intervention?” The Spotify article uses the example of testing the impact of a marketing campaign to promote a specific podcast. The DID approach uses a different podcast as the control (not treated to a marketing campaign) and the podcast that’s being marketed as the treatment. DID is implemented by taking two sets of differences between the groups:

The first difference is the difference between pre and post marketing campaign data taken for each group (control difference and treatment difference pre and post campaign), which consists of marketing effect + other effects we don’t care about.

The second difference is taking the difference between the two differences also known as double differencing (difference-ception…?). This is the difference between the treatment difference and the control difference from step one. This difference subtracts the “other effects we don’t care about” and isolates the marketing effect.

DID is usually estimated with a linear model that spits out standard errors and p-values to understand the significance of the result (pretty neat!). The article also explains some ways to deal with autocorrelations in time series data that can cause unreliable DID results. Overall, I thought the article provided interesting and comprehensive insight on how to work around cases where A/B testing isn’t possible, which I thought I’d share!

Read more on the Spotify blog here: https://engineering.atspotify.com/2023/09/how-to-accurately-test-significance-with-difference-in-difference-models/

--

--