Quick Takes: A|B Testing
Data Science is a complex discipline that combines programming , mathematics , statistics and scientific methods, and many of its concepts can or have to be approached on all these fronts.
Early on in my Data Science journey , bright-eyed and bushy-tailed , I consistently found myself trying to fit in as much content I could , going through hours (sometimes days) of googling , staring at screens , reading related literature, cramming and coding as much as I can. Helleuva ride! (s/o SG-DSI-5!)
That was then.
I learnt a lot in a short time , but also realised with most things that are fast, the concepts that I though I understood started to blur.
In the the spirit of ‘use it or lose it’, this post (I hope) will be the first of a series of possible questions we might get as Data Science Practitioners and a possible outline of an answer. The aim is prepare a concise way (5 minute tops) to answer common questions , I also identify fringe concepts that the conversation can ‘segway’ into.
‘Gargle’ — A search engine for people who want to find specific types of mouthwash, wants to add a new button to their main search page. What’s a way they can determine whether or not people enjoy this new button feature?
A|B Testing will be good way to solve that!
Key concepts we can highlight are Hypothesis testing, Confidence interval, Selection Bias
First, decide on the metrics common ones are: Daily/Monthly Active Users, Click-Through Rate, User Engagement through time etc.
Next, Prepare 2 or more versions of site/page , with a control (without feature changes), and 1 or more for each version of the button you want to test.
We will subsequently serve the different versions of the page to the population
Point of emphasis : we have to split the population randomly! Be careful not to introduce bias by not selecting randomly.
Once set-up is done, we can define the statistical hypothesis test: create a null hypothesis for your hypothesis testing, i.e. the CTR will be the same for control group VS group using new features.
Then, pre-define the acceptable Type I Error (False Positive Rate aka The Alpha)
Use this to determine whether there is statistically significant evidence to reject H0.
Additional talking points:
We will also want to assess the Power of the statistical test. This is (1 — Beta), where Beta is the Type II Error rate.
Some practical things to consider:
How long to run the experiment (use confidence intervals if we want to do so mathematically)
We will also have to decide on a protocol for outliers, such as truncation i.e. ignore points 3 IQR from median
Disclaimer : This is a refresher on these topics, and I hope this quick read will help jolt some synapses. This is by no means comprehensive and not meant to teach concepts from scratch. For that I will recommend article below.
The process of decision making in design has always been a popular area of discussion. Why do some designers make…medium.com
What Data Science question do you get often ? How did you explain it quickly ? Feel free to a comment or a clap if you liked it!