Quick Takes: A|B Testing

Photo by Franck V. on Unsplash

Data Science is a complex discipline that combines programming , mathematics , statistics and scientific methods, and many of its concepts can or have to be approached on all these fronts.

Early on in my Data Science journey , bright-eyed and bushy-tailed , I consistently found myself trying to fit in as much content I could , going through hours (sometimes days) of googling , staring at screens , reading related literature, cramming and coding as much as I can. Helleuva ride! (s/o SG-DSI-5!)

That was then.

I learnt a lot in a short time , but also realised with most things that are fast, the concepts that I though I understood started to blur.

In the the spirit of ‘use it or lose it’, this post (I hope) will be the first of a series of possible questions we might get as Data Science Practitioners and a possible outline of an answer. The aim is prepare a concise way (5 minute tops) to answer common questions , I also identify fringe concepts that the conversation can ‘segway’ into.


Possible Question:

‘Gargle’ — A search engine for people who want to find specific types of mouthwash, wants to add a new button to their main search page. What’s a way they can determine whether or not people enjoy this new button feature?

Photo by MIchael Pepper on Unsplash

Possible approach:

A|B Testing will be good way to solve that!

Key concepts we can highlight are Hypothesis testing, Confidence interval, Selection Bias

First, decide on the metrics common ones are: Daily/Monthly Active Users, Click-Through Rate, User Engagement through time etc.

Next, Prepare 2 or more versions of site/page , with a control (without feature changes), and 1 or more for each version of the button you want to test.

We will subsequently serve the different versions of the page to the population

Point of emphasis : we have to split the population randomly! Be careful not to introduce bias by not selecting randomly.

Once set-up is done, we can define the statistical hypothesis test: create a null hypothesis for your hypothesis testing, i.e. the CTR will be the same for control group VS group using new features.

Then, pre-define the acceptable Type I Error (False Positive Rate aka The Alpha)

Use this to determine whether there is statistically significant evidence to reject H0.

Additional talking points:

We will also want to assess the Power of the statistical test. This is (1 — Beta), where Beta is the Type II Error rate.

Some practical things to consider:

How long to run the experiment (use confidence intervals if we want to do so mathematically)

We will also have to decide on a protocol for outliers, such as truncation i.e. ignore points 3 IQR from median


Disclaimer : This is a refresher on these topics, and I hope this quick read will help jolt some synapses. This is by no means comprehensive and not meant to teach concepts from scratch. For that I will recommend article below.

What Data Science question do you get often ? How did you explain it quickly ? Feel free to a comment or a clap if you liked it!