Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Member-only story

Why Statistical Significance is ruining your AB tests

8 min readAug 31, 2022

--

Statistical significance in AB testing is pretty widely regarded as a benchmark statistic for deciding when to stop an experiment and ultimately whether to roll out a new feature following an AB test. However, waiting to collect enough data to meet statistical significance could be damaging the rate at which you can roll out new features.

image source

New features with a lower impact on your KPIs are likely to not reach statistical significance in a practical length of time due to the volume of data required. As a result you’re likely to waste your time developing new features only for them to get rejected after an experiment due to their small impact. Or maybe you reject an idea at the ideation stage because you can’t wait long enough to prove its value.

My argument is, if this new feature is likely to only contribute a small uplift in performance it is also very unlikely to cause a significant decrease in performance.

We can afford to make riskier decisions if the potential impact is smaller — and these smaller impact features are likely to be the most numerous and therefore cumulatively contribute a large impact.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Matt Crooks
Matt Crooks

Written by Matt Crooks

Principal Data Scientist - BBC