An insightful introduction to A/B Testing

Using an easy and relatable example

Published in

Nerd For Tech

5 min readJul 9, 2021

We might not even know but we could also be subjects of A/B testing at any time. These experiments run without us being aware of it.

Image Source: https://www.seobility.net/en/wiki/images/2/24/AB-Testing.png

We are living in a world surrounded by data and our decisions nowadays are data driven. We do not anymore rely only on our intuition of deciding which option would be the best. This is where A/B testing comes in as it helps to decide which option is better. A/B testing is widely used by e-commerce companies, designing companies, marketing managers to decide which image/title/text/colour might lead to more clicks by users or customers. For example, below are the two colour buttons among which an ecommerce company might decide which would lead the customers to click more and buy.

In this article, we will discuss- what is A/B testing, how is it used. and what are the thing we need to be careful about while performing A/B testing.

What is A/B testing?

In simple words, A/B testing is a way to compare two or more versions of something to figure out which performs better. In some places, it is also referred to as ‘bucket testing’ or ‘split run’ or ‘controlled websites’. It is an important part of machine learning and data science specially since past few years it has gained a lot of popularity. A/B testing in the field of machine learning could be considered as hypothesis testing using either confidence interval or p-value. Using these results, we can compare the effectiveness of the existing and newly developed model. If the new model is found to more significant based on the p-value, we can replace it with the old one.

How is it used?

To see how it is used, let me explain you with a real world example. Let us take the same example which we discussed above i.e. which colour should an e-commerce company shall choose for its buy button on its website. Suppose, the company was using the blue colour button and suddenly someone in the company had an intuition that the orange colour button might work better as it would lead more users to click on it, and go to the payment page and hence increase company’s sales. We will be calculating the Click-Through Rate (CTR) here. Click-through rate is the percentage of impressions that resulted in a click. We will compare the CTR for both buttons and find out which one gives us better results. For that, the company would perform A/B testing and find out whether the blue button shall be replaced or not.

Now, we will see the process that the company will follow to decide. The company has millions of users and it will randomly divide the users into two groups. Assuming, that the company has divided the sample into 90–10% ratio here. With the 90% population, we will use the old button/old model whereas for the 10% population, we will use the new button/newmodel that we have already trained. Now, we will compare the CTR from both the button colour styles/models. If the results indicate that old colour has a higher CTR, we would continue with it and discard the new one, whereas if it indicates that CTR for new colour button gives better results, we would further test the new button. As 10% data sample is a little low to directly replace it so it is feasible to perform more testing on it. If it works well with more data as well, we will replace the old one with the new one

Things to remember to avoid errors before you start testing

This is the simplified explanation of what A/B testing is and how companies are using it. Though there are certain complexities while conducting A/B testing.

Sample size: First one is dividing the data sample in the right proportion. Dividing the group is a tricky part as we cannot run the new button/model on a lot of users as we know that the old button is performing well and if the new one doesn’t perform well, it might lead to a huge loss for the company. Dividing rate is usually based on the industry or the domain for which we are testing.

Invalid Hypothesis: The concept of A/B testing is all based on what we want to change. So choosing the incorrect hypothesis reduces your chances to get proper results.

Right metric for comparison: Next one would be choosing the right metric for comparing the results. In the above example, we directly used the point estimate for comparing the two results, though it might not give accurate results, as there are inequalities in the sizes of the groups, which might result in biasedness. A better approach is using the Confidence Interval approach and comparing the two results.

Sampling error: Another issue is that the results appearing are a result of sampling. Suppose if we again resample the data, and this time we see that the results appear different from the last time. Basically, with original sampling, new button showed better results whereas if we resample it again, this time the old button shows better results. In such cases, what we do is that we conduct A/A’/B testing, where we test the old model with two datasets (A & A’), and the new one with one small dataset(B). If B gives better results than A & A’, then it gives us more surety and then we can proceed further as we did above.

Time Period: The time period we choose to compare the two models should be similar, it would be unfair and biased if we choose a day or time with higher traffic for one model and normal for another, as it could lead to inaccurate results.

Situations where A/B testing doesn’t make sense

It is not a fast process and we might have to run multiple tests for that. This results in waiting long time time to see the results wherein we could utilize this opportunity cost in some other place.
We need a large sample data size to conduct A/B testing to obtain statistically significant results

To conclude, A/B testing does offer a lot of benefits if the test is conducted adequately. Though we need to careful about certain things in order to produce optimum results.