How Simpson’s Paradox Could Impact A/B Tests

Bithika Mehra
The Startup
Published in
5 min readSep 7, 2020

--

Source: Pinterest

Simpson’s paradox occurs when we observe a certain trend in the aggregate data but not in the underlying segments that comprise the data. In the A/B testing domain, Simpson’s Paradox can occur when the overall mean conversion rate and/ or average order value of the experiences tested point to a result different from the mean conversion rates and/ or average order value of the underlying segments.

Let me illustrate this with an example from Georgi Georgiev’s blog post, instructor at CXL. Suppose you run an A/B test between Page A and Page B and see the following results:

Looking at the average conversion rate, it looks like you have a conclusive test with B beating A (assuming the sample size requirements, and other conditions such as statistical significance and power were met). But before you take that victory lap around the office, you see something completely unexpected. When you segment the data by the different traffic sources, you see that A has outperformed B for each traffic source!

A/B Test results broken down by traffic source

What does this mean? How is this even possible? This is a classic example of Simpson’s Paradox.

What causes Simpson’s paradox?

--

--

Bithika Mehra
The Startup

On the path to learning all things insights and optimization (linkedin.com/in/bithikamehra) | Foodie | Environmentalist| Loves to travel | Player of a few riffs