Statistical Power and Power Analysis

Introduction to key statistical concepts such as effect size, statistical power, significance level, and sample size.

Akshat Anand
DataX Journal
6 min readApr 12, 2020

--

The statistical power by the hypothesis test is referred to as the probability of measuring or detecting an effect, only if there are any true effect to be present to detect.

Power is the confidence one might have in the conclusion which can be drawn from the results of the given study. It can be used as a tool to estimate the sample size or the number of observations in order to measure an effect in an experiment.

In this blog we will go through:

1, A power analysis that is used to estimate the minimum number of sample sizes required for an experiment from the desired significance level, effect size, and statistical power.

2. The statistical power is the term where the probability of the hypothesis test of finding an effect if there any effect is found.

3.Lastly, how we can calculate and plot the power analysis in python in order to design an experiment.

Hypothesis Testing(Statistical Hypothesis Testing)-

A hypothesis test makes an assumption about an outcome, which is termed as the null hypothesis. This is often interpreted using a p-value, the probability of observing that the null hypothesis is true or not.

p-value(p)- The probability of obtaining a result equal to or more extreme than was observed in the data.

While interpreting the p-value of a significance test, we can specify a significance level(referred to as alpha(a)). A general value for significance level is 5% and written as 0.05.

In fact, the p-value is interested in the context of the chosen significance level. This makes the significance test as “Statistically significant” if the p-value is less than the significant level which means the null hypothesis is neglected.

  • p≤alpha: (reject h0, different distribution)
  • p>alpha:(doesn’t reject h0, same distribution)

And,

  • Significance level(alpha): Boundary for measuring a statistically significant finding while interpreting the p-value.

Hence we see that p-value is a probability and that can be actually different as the test could go wrong. There can be errors in our interpretations.

So, there are two types of error:

  • Type I error: It rejects the null hypothesis when there is no significant effect(which we call false positive). Here the p-value is small.
  • Type II error: It doesn’t reject the null hypothesis when there is a significant effect(called as false negative). Here the p-value is large.
source-Reddit

In this context, we can think of the significance level as the probability of rejecting the null hypothesis if it were true i.e. the probability of making a false positive

Statistical Power

Statistical power or the power of a hypothesis test is a probability that test correctly rejects the null hypothesis i.e. the given probability of a true positive result. It is useful only if when the null hypothesis is eventually rejected.

The higher the statistical power for an experiment, the lower the probability of making a Type II (false negative) error. That is the higher the probability of detecting an effect when there is an effect.

Power = 1 — Type II Error

Pr(True Positive) = 1 — Pr(False Negative)

While interpreting statistical power, we seek setups that have high statistical power.

  • Low Statistical Power: Large risk of committing Type II errors, e.g. a false negative.
  • High Statistical Power: Small risk of committing Type II errors.

When the experimental results with having low statistical power will eventually lead to invalid conclusions about the meaning insights

Power Analysis

Generally, statistical power is related to four parts.

  • Effect Size. The quantified magnitude of a result present in the population. Effect size is calculated using a specific statistical measure, such as Pearson’s correlation coefficient for the relationship between variables.
  • Sample Size. The number of observations in the sample.
  • Significance. The significance level used in the statistical test, e.g. alpha. Often set to 5% or 0.05.
  • Statistical Power. The probability of accepting the alternative hypothesis if it is true.

A power analysis involves estimating one of these four parameters given values for three other parameters. This is a powerful tool in both the design and in the analysis of experiments that we wish to interpret using statistical hypothesis tests.

As a beginner, we can start with sensible defaults for some parameters, such as a significance level of 0.05 and a power level of 0.80. We can then estimate a desirable minimum effect size, specific to the experiment being performed. A power analysis can then be used to estimate the minimum sample size required.

In addition, multiple power analyses can be performed to provide a curve of one parameter against another, such as the change in the size of an effect in an experiment given changes to the sample size. More elaborate plots can be created varying three of the parameters. This is a useful tool for experimental design.

Application with python

In this section, we will look at the Student’s t-test, which is a statistical hypothesis test for comparing the means from two samples of Gaussian variables.

The test will calculate a p-value that can be interpreted as to whether the samples are the same (fail to reject the null hypothesis), or there is a statistically significant difference between the samples (reject the null hypothesis). The common significance level for interpreting the p-value is 5% or 0.05.

Significance level (alpha): 5% or 0.05.

The size of the effect of comparing two groups can be quantified with an effect size measure. A common measure for comparing the difference in the mean from two groups is the Cohen’s d measure.

Effect Size: Cohen’s d of at least 0.80.

We can use the default and assume a minimum statistical power of 80% or 0.8.

Statistical Power: 80% or 0.80.

For a given experiment with these defaults, we may be interested in estimating suitable sample size. That is, how many observations are required from each sample to at least detect an effect of 0.80 with an 80% chance of detecting the effect if it is true (20% of a Type II error) and a 5% chance of detecting an effect if there is no such effect (Type I error).

We can solve this using a power analysis.

The statsmodels library provides the TTestIndPower class for calculating a power analysis for the Student’s t-test with independent samples. Of note is the TTestPower class that can perform the same analysis for the paired Student’s t-test.

The TTestIndPower instance must be created, then we can call the solve_power() with our arguments to estimate the sample size for the experiment.

#perform power analysis
analysis = TTestIndPower()
result = analysis.solve_power(effect, power=power, nobs1=None, ratio=1.0, alpha=alpha)

The complete example is listed below.

# estimate sample size via power analysis
from statsmodels.stats.power import TTestIndPower
# parameters for power analysis
effect = 0.8
alpha = 0.05
power = 0.8
# perform power analysis
analysis = TTestIndPower()
result = analysis.solve_power(effect, power=power, nobs1=None, ratio=1.0, alpha=alpha)
print('Sample Size: %.3f' % result)

Running the example calculates and prints the estimated number of samples for the experiment as 25. This would be a suggested minimum number of samples required to see an effect of the desired size.

Power curves are line plots that show how the change in variables, such as effect size and sample size, impact the power of the statistical test.

The plot_power() function can be used to create power curves. The dependent variable (x-axis) must be specified by name in the ‘dep_var‘ argument. Arrays of values can then be specified for the sample size (nobs), effect size (effect_size), and significance (alpha) parameters.

# calculate power curves from multiple power analyses
analysis = TTestIndPower()
analysis.plot_power(dep_var='nobs', nobs=arange(5, 100), effect_size=array([0.2, 0.5, 0.8]))

The complete example is listed below.

# calculate power curves for varying sample and effect size
from numpy import array
from matplotlib import pyplot
from statsmodels.stats.power import TTestIndPower
# parameters for power analysis
effect_sizes = array([0.2, 0.5, 0.8])
sample_sizes = array(range(5, 100))
# calculate power curves from multiple power analyses
analysis = TTestIndPower()
analysis.plot_power(dep_var='nobs', nobs=sample_sizes, effect_size=effect_sizes)
pyplot.show()

Running the example creates the plot showing the impact on statistical power (y-axis) for three different effect sizes (es) as the sample size (x-axis) is increased.

We can see that if we are interested in a large effect that a point of diminishing returns in terms of statistical power occurs at around 40-to-50 observations.

--

--