An Intuitive Approach towards Understanding A/B Testing-I

Kavikumaran R
The Startup
Published in
5 min readJun 24, 2020

In our day-to-day life, be it personal or at work, we are always presented with a lot of options to choose from. And we always tend to believe that we have made the best possible choice. But have we?

What’s so unique about this article?

There are many articles out there that clearly explains everything about A/B Testing, so why this? People tend to shy away from learning as soon as they see this word. STATISTICS! Here’s a simpler and precise explanation of it without including any fancy jargon or Greek alphabets.

Image by JB Shreve from theendofhistory.net

Does that mean there are no statistics covered in this article? Yes and No. We are just going to use the term INTUITION in place of statistics and better understand the mechanics of A/B Testing from the scratch.

What is A/B Testing?

In short, we are presented with 2 choices A and B(known as ‘variants’) out of which we have to figure out the best-suited option for our use case. It’s a relatively straightforward concept which aids the Conversion Rate Optimisation (CRO) and Funnel Analysis in most businesses. In general, A/B Testing is about introducing a new feature or conducting a campaign and analyzing whether or not it had the expected impact on the target audience in a controlled way. By the controlled way we mean, the users in the Control and Treatment group are randomly selected which eliminates the bias in the experimentation process, and all other factors (known as guardrail metrics) that may influence our Target Metric are maintained constant. Not familiar with what a Control and Treatment group means? Keep reading!

Understanding the Terminologies:

Control Group: refers to a segment of users who are shielded from the experiment/ variant (often known as the placebo group). Why do we need it? It acts as a benchmark to measure the success of the test and eliminates bias in decision-making process.

Treatment Group: refers to the segment of users who are subjected to the experiment/ new variant.

Variance: is the measure of spread between the numbers in a dataset. Ahh! Let’s make it sound a little less statistical.

Consider the following scenario of determining the better player amongst the two:

For comparison sake, the total scores of both the players in our data add up to 490. But does it really look like both players have been in the same form? Give it a thought and see what differentiates both these players. Consistency it is!

Both players share the same average of 49.0 in 10 matches, but the difference between scores made each match and their average score (quantified as Standard Deviation) is high for Player 2 which is definitely not a good sign. Player 1 has managed to score 30+ scores in all his matches whereas Player 2’s performance is kind of unpredictable. One can argue that Player 2 still manages to make big scores which Player 1 didn’t. But for understanding what variance is we shall agree that Player 1 is a more bankable player.

Confidence and Sample Size:

To make this more interesting, Here comes Player 3 with match scores 10, 120. Now is it right to compare Player 3 having played just 2 matches with the other 2 players? We may never know how Player 3 will perform in the upcoming matches and from our example, we would require at least 10 match scores (sample size) to do a comparison and arrive at a conclusion. Let’s dive deeper into this for better understanding.

Considering only the first 2 scores made by Player 2 would lead to an average score of 39 (calculated as (66+12)/2) which is a little lower than the average calculated using 10 match scores. On including the third match score (82) the average rises to 53. The key takeaway for us here is that with more matches played the fluctuation in the average score made by the player tends to stabilise which in-turn helps us to gain confidence in the test results we get at the end.

Now here is the question for you, In a Cricket tournament, there is Team A which has won 2 of 2 matches and Team B which has won 20 of 20 matches. Which of these two teams are impressive? I’m hoping you would agree that it is Team B. Hence the key takeaway here is

“With smaller sample size there is more variability and less confidence and with larger sample size there is less variability and more confidence”

Understanding your Target Metric:

A Target Metric is simply the change that you want to witness as a result of your experiment or the campaign. It can be the click-through rate of your marketing emails, revenue generated in an e-commerce site, etc. Basically, your target metric can be either continuous or discrete in nature. Not so clear?

Consider the first scenario where you are basically trying out a new email template or a subject line for marketing. In order to quantify the impact of the change, we just have to figure out the number of clicks and no clicks but nothing else.

The Yes/No kind of an outcome is termed as a Discrete or Boolean outcome

On the Other hand, Suppose you want to see if there is an increase in the revenue generated by your e-commerce website after introducing a UI change for better navigation. In this scenario, the revenue generated could be any number, and range of it may also vary depending upon the size of business.

Why is it so important to understand this difference?

It kind of decides what statistical test should be carried to check and quantify the significance of the impact. One can also decide beforehand how much of that impact is expected at the end of the experiment.

In my next post, we shall see

  • An end to end flow on ‘How A/B Testing is carried out’ with an example
  • Best practices of A/B Testing

--

--

Kavikumaran R
The Startup

A data and product enthusiast, an avid writer, here to decode cool stuff that you wish had an easier explanation.