Guide to A/B Testing — Summary of Trustworthy Online Controlled Experiments: Introduction

3 min readOct 29, 2021

In this article, we will be going through the Chapter-1 of the book — Trustworthy Online Controlled Experiments (One of the best sellers for A/B testing)

Part-I focuses on the importance of A/B testing and how it can help in increasing the overall revenue of the company.

For Example — One of the Bing company employees suggested changing the display of the ad headlines ( His idea was to lengthen the heading by combining it with the text from the first line) — this small change lead to Bing’s revenue go up by 12% ~ approx. over $100M annually in the US alone.

We can imagine now how a small change can impact our business.

From the above example, we can say that —

It is difficult to assess the value of an idea. Even for the above small change, it got delayed for months.
Small changes can have a big impact.
Experiments with big impact are rare.
The overhead of running an experiment must be small.
The OEC ( Overall Evaluation Criteria: a measure of experiment objective Eg — active users per day etc.) must be clear from the beginning.

We have many terminologies for Controlled Experiments such as the A/B test, A/B/n test (for multiple variants), field experiments, randomized controlled experiments split tests, bucket tests, and flights.

Note: Most users are split randomly between variants in a persistent manner.

Question: what is a variant?

So basically variant can be considered as two groups in A/ B testing where two variants can be A and B usually called control and treatment, where control is the special variant that uses the existing version while treatment is the other variant on which we run the changes.

Note: Randomization is very important to ensure that the population assigned to the different variants are similar statistically

Steps for online controlled experiments —

Choose your experimental units, for example, let’s take the user here as the experimental unit.
Assign the experimental unit i.e users to each variant without interference (or little interference) such that the user of the control group does not impact the treatment group and vice-versa
Make sure your experimental unit i.e users are sufficient enough for results to make sense. The larger the number, the more it will help in detecting even the smallest effect.
Identify your OEC (Overall Evaluation criteria) and discuss it with your leaders so that everyone agrees on that.
Maintainance or change requests should be easy to implement.

Examples —

Google tested 41 gradations of blue color on google search result pages. It ended up being substantially positive on user engagement. Similarly, Microsft’s Bing color tweaks showed that users were more successful at completing tasks, it eventually led to improvement in monetization over $10M anually in the US
Amazon moved the credit card offer from home page to the shopping cart page showing simple math by highlighting all the savings user would receive. This simple change increased Amazon’s annual profit by tens of millions of dollar.
Amazon ran an experiment of showing recommendations based on shopping cart which led to a great profit.

Likewise, we have multiple examples be it changing the code, making it more effective resulting in improved performance or reducing malware, etc.

Let’s review two key scenarios-

Scenario-1: You have a business strategy and you have a product with enough users to experiment.

In this scenario, experiments can help you to reach a local optimum based on your current strategy and product

Experiments can help in identifying areas with high ROI and hence improving the OEC
Experiments can help with optimizations (Example — color, spacing, font etc.)
Experiments can help in continuously iterate to a better site design.
Experiments can be critical in optimizing backend algo and infrastructure such as recommendation and ranking algo.

Scenario-2: You have a business strategy and you have a product but the results suggest that you need to consider a pivot.

In this scenario, you need to consider the below points-

Duration of experiments — the experiment should be long enough to capture the seasonality.
Number of ideas tested — you might need many experiments to come to the right conclusion.

In my next article, I will walk you through the End to End example of performing an Experiment. I hope you enjoyed it. :). Happy Learning :)