What really is statistical significance ?
The term ‘statistical significance’ often comes up in analyzing results from experiments or observational studies.
A common interpretation I have heard in an experimentation context is -
Statistical significance tells us if the difference in results between different populations is ‘large’ enough or simply put, whether an experiment had a significant impact.
This is a flawed interpretation and what’s driving this is probably the word ‘significant’.
Statistical significance has its roots in the area of inferential statistics which essentially deals with inferring about population from samples.
Let’s understand with an example –
You have created a new online marketing campaign and want to understand if it’s better than the current one. Typically you would expose a sample of visitors to this new campaign and compare the mean values of the target metric (e.g. conversion rate) between the two samples i.e. old and new.
You run this experiment for a week and evaluate the results.
Let’s consider two scenarios.
Scenario 1:
Your baseline conversion rate with the old campaign is 8%, you have exposed 10% visitors to the new campaign and get a mean conversion rate of 9.5%.
As you can see, this result is not statistically significant at a 95% confidence level (that’s what most online calculators use by default). Essentially, a 1.5 percentage points increase is not statistically significant.
Scenario 2 :
Your baseline conversion rate with the old campaign is 8%, you have exposed 50% visitors to the new campaign and get a mean conversion rate of 9%.
This is a statistically significant result even though the conversion rate of the new campaign is 9% which is lower than the 9.5% in the first scenario. All we changed is the sample size exposed to the new campaign.
So, what does this tell us about statistical significance?
- It is not about the magnitude of the difference in the target metric
- It is about the confidence in the result i.e. the result being drawn from a relatively larger sample isn’t purely due to chance. We know smaller samples are typically more variable and there is a possibility that better (or worse) results might come in due to chance.
There is a lot more theory behind the calculations of statistical significance i.e. p-values, confidence intervals etc. which are topics in themselves. The purpose of this article to demonstrate what ‘significance’ means from a practical standpoint.
Statistical significance is different from practical significance. For example, a difference of 1.5% points in conversion rate in scenario 1 might still be practically significant, just that we are less confident about it compared to a difference of 1% in scenario 2.
On the other hand a tiny difference on a very large sample might be statistically significant but doesn’t mean much for the business from a practical standpoint.
Professionals should be guided by both statistical and practical significance in their decisions.
What are your thoughts on statistical significance ?