Using Simulation to Estimate the Power of an A/B experiment

Naveenan
Analytics Vidhya
Published in
3 min readMar 6, 2019

--

Power of an experiment measures the ability of the experiment to detect a specific alternate hypothesis. For example, an e-commerce company is trying to increase the time users spend on the website by changing the design of the website. They plan to use the well-known two-sample t-test. Power helps in answering the question: will the t-test be able to detect a difference in mean time spend (if it exists) by rejecting the null hypothesis?

Null Hypothesis(Ho): New design has no effect on the time users spend on the website

Alternate Hypothesis(Ha): New design impacts the time users spend on the website

When an A/B experiment is run to measure the impact of the website redesign, we want to ensure that the experiment has at least 80% power. The following parameters impact the power of the experiment:

  1. Sample size(n): Larger the sample size, smaller the standard error becomes; and makes sampling distribution smaller. Increasing the sample size, increases the power of the experiment
  2. Effect size(𝛿): Difference between the means sampling distribution of null and alternative hypothesis. Smaller the effect size, need more samples to detect an effect at predefined power
  3. Alpha(𝛼): Significance value is typically set at 0.05; this is the cut off at which we accept or reject our null hypothesis. Making alpha smaller requires more samples to detect an effect at predefined power
  4. Beta(β): Power is defined as 1-β

Why power analysis is done to determine sample size before running an experiment?

  1. Running experiments is expensive and time consuming
  2. Increases the chance of finding significant effect
  3. Increases the chance of replicating an effect detected in an experiment

For example, the time users spend currently on the website is normally distributed with mean:2 minutes and standard deviation:1 minute. The product manager wants to design an experiment to understand if the redesigned website helps in increasing the time spent on the website.

The experiment should be able to detect a minimum of 5% change in time spent on the website. For a test like this, an exact solution is available to estimate sample size since sampling distribution is known. Here we will use the simulation method to estimate the sample and validate the same using exact method.

The following steps estimate the power of two-sample t-test:

  1. Simulate data for the model under null 𝒩(2,1) and alternate hypothesis 𝒩(2+𝛿,1)
  2. Perform t-test on the sample and record whether the t-test rejects the null hypothesis
  3. Run the simulation multiple number of times and count the number of times the t-test rejects the null hypothesis. The proportion is an estimate of the power of the experiment

Code to compute power of experiment for a specified sample size, effect size and significance level:

Power of the experiment is 58.8% with sample size of 1000

Code to compute sample size required to reach 80% power for specified effect size and significance level:

Based on simulation methods we need 1560 users to reach power of 80% and this closely matches with sample size estimated using exact method

Code to compute sample size using exact method:

Conclusion

This article explained how simulation can be used to estimate power of an A/B experiment when a closed form solution doesn’t exist.

--

--