ABCs of A/B test — explained right from the basics (Part 2)

7 min readJun 22, 2018

Prequel to this post is — Part 1

But let’s revise through a brief summary —

Testing everything is the key! Your new idea to change the color of the page or the placement of a button or what not! Test it, and you will be sure of whether your idea works or not. This is indeed a less risky approach. Before we test, we need to confirm 3 things — 1. What needs to be tested; 2. What is the traffic diversion techniques; 3. Duration of the test. Once these important aspects are in place, we need to define the result of the test. The outcome. Just like after writing an exam, we have a fair idea of what we can score (maybe….). Revisiting technical terms — Control is the existing Web page A and Variation is modified version Web page B. The goal of the experiment should be extremely clear and well defined, a metric like sales or number of visits, number of downloads, number of views or anything should be used to know “what are we evaluating using A/B testing”

Mike, our guy from the meeting, claimed that “Free shipping and returns” would make more people buy our products. So, if we are currently selling 5 chairs on our website, we should see an improvement anywhere from 6 onwards. Mike’s claim can be called as a hypothesis. However, hypothesis should be more definitive like —

Adding the “Free shipping and returns” label in the chair product page will increase the chairs sales to 8%

That’s a claim or an adequately written hypothesis. We are clear about the change and its results — 8% sales metric to determine the effectiveness (i.e. passing marks to clear an exam). This means that after the changes are made, the sales of the chairs must be 8% or more to pass the exam. Now for a few concepts of statistics….

Statistics concept of Null Hypothesis states — there is no difference between the Control (existing Web page A) and Variation (modified “Free Shipping and Returns” page B) in their sales. This means that changing the page did not have any impact on sales. Mike would surely be disappointed if this null hypothesis was proved right! Alternate Hypothesis states that there is a difference between the Control and the Variation, and this will be of more interest to Mike!

It also goes on to show that a strongly written hypothesis is extremely important for analyzing the results more specifically. Imagine a hypothesis like — “Changing the font of the name of the product will increase the sales”. No specifications makes it ambiguous and hence a very weak hypothesis. What should be the font, are we increasing or decreasing the font? Increase the sales by how much? Specifics are important. Let’s improve our hypothesis further —

Adding the “Free shipping and returns” label right below the display picture of the chair on the product page will increase the chairs sales to 8%. <Snapshot 1 of existing WebpageA & Snapshot 2 of version Webpage2>

Yay! This completes all the pre-work, and now is the time to start testing and observe some results. Remember — wait for the predefined duration to complete the testing. Do not stop the test in between, and/or jump to conclusions during the testing process. Patience is the key!

Now start the A/B Test. There are several tools in the market to choose from to setup A/B test like Google Analytics, KISSmetrics, Optimizely, Visual Website Optimizer, Unbounce etc. Just pick a tool and A/B test away.

Let’s understand a simple concept of Conversion Rates. Every website/webpage/campaign/ad has a goal. A goal can be to sell more (by 5%). However a goal does not necessarily be to improve sales, it can be to improve the awareness by increasing the views, improving the number of people downloading the content, improving the duration of view by x%. A goal to increase a specific <thing> by x% is called a conversion rate. For the hypothesis as defined by Mike, the conversion rate desired to achieve is 8%. This can be further formulated as — Of 100 clicks, 8 purchases should happen. This further enunciates the need for a well-defined hypothesis.

Once our A/B test starts running, we tend to look into the results every day. Be patient, I re-iterate. Wait for the 3–4 weeks to actually know if Mike’s claim is correct or not. After 3–4 weeks, determine the conversion rate.

Let’s assume some numbers and do some calculations —

After 3 weeks, Web page A received 120000 clicks and 6000 chairs were sold, the visitors did not see “Free Shipment and Return” label. Web page B received 100000 clicks and 8000 chairs were sold, the visitors saw “Free shipment and Return Label.

ABBA tool — https://thumbtack.github.io/abba/demo/abba.html — Recommended by my mentor, is a very simple and efficient. Just enter the values and get results —

Lets compute stepwise, for Web page A = Conversion Rate = 6000/120000 = 5%. Web page B = Conversion Rate = 8000/10000 = 8%. Clearly we see that the Webpage B led to improvement in conversion rate or in other words, led to more sales. But there are few statistical concepts that we will now slowly introduce and explain.

Results from ABBA, matched with the conversion rate that we computed (check the success rate in brackets):

Let’s revisit the null hypothesis. The Null hypothesis states that the Web page A and Web page B have no difference in impact on the conversion rate. The significance or the strength of the claim or the null hypothesis is measured by p-value. A p-value determines the significance of the results. P-value ranges between 0–1.

A lower p-value (p≤ 0.05) indicates strongly against the null hypothesis, and hence we can reject the null hypothesis.

Optional Reading — The method of diverting traffic to two websites and concluding the results may have some component of error. Type 1 Error — in a hypothesis test, like the one we are doing, we incorrectly conclude that the results show an improvement. Type 2 Error — in a hypothesis test, we incorrectly conclude that there is no improvement. In simple words, there is a possibility that we may conclude wrongly, and hence need to be very careful about the amount of error we are ready to accept to risk. Hence comes the concept of Confidence Level, which indicates how confident are we about the results. 95% confidence the popularly used and also called statistical significance level if we conclude that we observe an improvement, we are 95% sure that our conclusion is right, the flip side to it being we have a 5% chance of being wrong as well. For best results go with 95% confidence level.

Statistics is not that difficult if concepts are understood well and explained based on the context.

Deriving some conclusions about the A/B Test —

Web page A shows a conversion of 5% while Web page B shows a conversion of 8%, an improvement of 60% which is quite cool!
These results seem to be trustworthy, as the p-value is very small — 0.0001.

Mike was correct, the changes in the webpage did show an improvement in the conversion, which seems promising. Now that the A/B testing is over and the results are out, Web page A should be switched with Web page B for that 60% boost in sales.

What-if analysis

What-if conversion rate remained unchanged: Let’s look at the 1st scenario, say the conversion rate for both the webpages was 5 %. Then the p-value shows as 1, which is quite high. Remember a p-value ≤ 0.05 proves strongly against Null Hypothesis. When p-value = 1, it strongly supports Null Hypothesis. In this case we accept Null Hypothesis and conclude that Mike’s proposal did not lead to any improvement.

What-if conversion rate improved slightly but p-value defies this improvement: Let’s consider a 2nd scenario, say the conversion rate of Web page B is 5.1%. The p-value is 0.29. A p-value ≤ 0.05 proves strongly against Null Hypothesis, else it significantly supports Null Hypothesis. There is an improvement in the conversion rate due to the change in Web page B. But we cant trust the results much as the p-value of 0.29 > 0.05

What-if conversion rate improved slightly and p-value kind-of agrees to this improvement: Last and the 3rd scenario to consider is, say conversion rate is 5.2 percent and p-value is 0.03, 0.03 < 0.05, which shows the results can be trusted. But it is always better to have p-value as low as 0.01 or lower for less risky and better results.

P-Value is the driver of the A/B test. An A/B test results’ significance is dictated by this single most important value! Look out for, at all times…

More on A/B test to come…. Watch out for my blogs….

ABCs of A/B test — explained right from the basics (Part 2)

Written by Pratibha UR