The Pitfalls of Running A/B Tests

Many people who create digital products have probably heard of the term ‘Designing with Data’. It’s a very obvious practice, that suggests that making intuition-based decisions is not enough, and better decisions are usually supported by quantitative or qualitative evidence.

This leads many teams to run A/B Tests. In short, A/B tests are a way to offer slightly different versions of your product to users of the same initial group, and measure the difference in their behavior. They’re probably one of the best ways to bring actionable data.

The reason A/B tests are so effective, is because they basically mean asking your users absolute questions with 100% truth in the results. For example, by running a simple A/B test you can ask ‘How many extra sales will I make if I offer free shipping worldwide?’. To get an answer for this question, all you need to do is to offer free shipping to 50% of your users, and measure the sales in that group compared to the rest. Then, using simple calculations, you can measure the profitability of adding ‘free shipping’ and decide if it’s worth it or not.

I’ve always been a big advocate of A/B tests, but time led me to learn that they’re highly addictive and sometimes not very justified.

What I’ve learned is that while running them, there may be pitfalls that will lead you into making a bad choice. Here are a few examples:

1. Some of the impact may be unforeseen at first

A/B tests might hint on a specific change now, but this slope changes as time goes by and the initial results might be misleading.

For example, at JoyTunes we once tested offering a 1 month free trial instead of 7 days free trial. We saw that the conversion to subscription dramatically increased, but it took us over 2 months to see that the overall revenue decreased. This was because cancellation rates were higher among people who took a 1 month trial, and less people actually paid for a yearly subscription.

2. Query mistakes are a thing

In more complicated A/B tests, such as those that measure retention, investigating the numbers can be harder than it seems. We do that using SQL queries. Sometimes, the calculations can be very messy but still yield reasonable results, which are completely wrong.

We once offered a discount that encourages users to renew their subscriptions. We A/B tested it, and saw a huge increase in renewals among users who got the discount. Only later we found that our initial query had a bug, and the increase was insignificant to justify the discount.

3. The sample size has to be big enough

We once redesigned our app’s purchase screen. At first, we saw a 20% increase in conversion rate among the group that got the redesigned screen.

During the following weeks the overall change in conversion rate dropped to 10%, and eventually to 5%. Apparently, the experiment initially ran on a small sample group, and the difference was not statistically significant.

4. Numbers don’t have human empathy

When Facebook first introduced notifications in their products, they probably saw a positive impact on engagement. Today, research shows just how unhealthy social networks can be to many people, much of it can be credited to notifications and other features with high engagement. In A/B tests, it’s impossible to see that type of negative impact on people’s lives, and it may only be noticed after doing scientific research.

5. A/B tests may slow you down

Creating different tests costs valuable time and money. It’s always great to learn the performance of each variation, but if you already know where you plan to go, just shipping and heading towards your target at full speed would probably be the better choice.

Overall, I’m a fan of designing with data. People say that ‘there is no truth or false in design’, but data that comes from A/B tests is actually a good indication for ‘truth’. It’s also a great way for us to learn about our users’ behavior and make better design decisions later.

But sometimes, I really prefer to avoid A/B testing things. They can be misleading, slow you down, and prevent you from doing positive changes in the app, such as spending time designing a delightful animation, redesigning an outdated screen or even improving loading time, because while these changes are important for your users, their impact in an A/B test might be minor.

So the next time you want to run an experiment, I suggest asking yourselves these three questions:

  1. Do you already have a solution that you know is better for your users?
  2. Do you still want to consider the other solutions as an option?
  3. Is an A/B test the best way to decide between them?

Thanks for reading! Check out more of my stuff: Portfolio, Medium, Dribbble, Twitter.

Want to take part in our future A/B tests? Check out our open roles at JoyTunes (offices located in Tel Aviv).