Permutation testing explained with an example

Moving beyond hypothesis testing

Published in

Data Science in your pocket

4 min readJun 6, 2023

Statistical testing has been a fundamental tool for drawing meaningful conclusions from data. For example, You must have read about hypothesis testing which includes

t-tests: This test helps us in comparing the means of two sample population
z-tests: Similar to t-tests but for comparing means of two populations (and not sample population)
ANOVA: Helps to compare means of multiple groups

You can read about these tests in the Stats section below

But a limitation faced with the above-mentioned tests is that they have a few assumptions about the data like data should be normally distributed, assumption of equal variances, etc.

What is the assumption of equal variances? The assumption states that the spread, of the dependent variable is the same across different groups or levels of the independent variable(s) being compared

Permutation testing can be a better tool for conducting statistical testing if you are not sure about the distribution of the data or the assumption of equal variances is violated for comparing multiple groups using a Null Hypothesis based on any statistic like mean, median, variance, etc. (and not limited to just mean). It finds a lot of use cases

Conducting hypothesis testing
A/B testing for multiple categories i.e. A/B/C/D testing
Finding collinearity between variables in the dataset

And that too, without any assumptions on data distribution.

How does permutation testing work?

Assume we wish to compare whether a particular drink, say Bournvita, affects the body weight of folks from different age groups namely

A (age<15); 100 samples
B (age>30 but <15); 50 samples
C (age<45 but >30); 60 samples
D (age>45); 80 samples

The statistic we would be calculating is a difference in means which is the average of the difference of means of different groups (Avg(meanA-meanB+meanA-meanC…..meanC-meanD)). though we can calculate some other metrics as well like the ratio of means.

Let’s point down the steps then

State the null hypothesis. In this case: No weight change due to Bournvita in any groups
Set a significance level as we do in hypothesis testing. Let it be 0.05
Calculate the mean weight for each of the groups and then the observed difference in means as mentioned above.
Now, merge all the samples to form a single dataset from all groups. This will lead to a dataset with 100+50+60+80 = 290 samples

Now, for X iterations

Recreate new groups A1, B1, C1, and D1 by sampling from this pooled dataset such that the group size remains the same as in the original i.e. A1=100 samples, B1=50 samples, C1=60 samples, and D1=80 samples. This resampling is done without replacement in the default settings
Calculate the difference in means for these new groups and accumulate.

Now, as you repeated the above 2 steps for X iterations, you got ‘X’ samples for the difference in means. The next step is to calculate the p-value, as we do in Hypothesis testing but in a different way

Calculate the number of samples from ‘difference in means’ which are equal to or greater than the observed difference in means (step 3). Let this be ‘y’

Assume X=100, out of which the difference of means is > than the observed difference in means 10 times. Hence y=10

Calculate the p-value= y/n where n=X (number of iterations we ran). In this case,p-value=10/100=0.1
If the p-value<significance level, the null hypothesis can be rejected. As 0.1>0.05, the null hypothesis won’t be rejected in this case.

Done and dusted !!

Depending upon the way we generate resampled datasets during our X iterations, we can have two types of permutation tests

Exhaustive permutation tests: So, instead of resampling for some ‘X’ iterations without replacement, we will consider every permutation possible for the merged dataset. Hence, in our case, this will lead to

C(290, 100) * C(190, 50) * C(140, 60) * C(80,80) datasets (see Permutation & Combination to calculate the total possible combinations).

As you would have assumed, this is computationally heavy and hence not feasible with bigger datasets

2. Bootstrapped permutation tests: In default settings, we do resampling without replacement but in bootstrapped version, we will do replacement as well.

Finally, we do have an implementation of permutation testing in Python. I found two libraries

permutation_test
mlxtend

Though, to the best of my knowledge, both of these libraries can perform permutation tests on just 2 groups and can’t be extended to multiple groups. For now, let’s see the implementation using permutation_test. You can try mlxtend from here.

The permutation_test implementation is easy

#pip install permutation_test

import permutation_test as p
data =     [4,1,9,9.2,0,1,2,5]
ref_data = [3,4,4,5,5,5,6,6,7]
p_value = p.permutationtest(data, ref_data)

Where data=Group 1 and ref_data=Group 2.

Let’s understand the output

The p-value comes out to be 0.21 and hence the null hypothesis that the 2 groups aren’t different can’t be rejected.

Testing with another set

import permutation_test as p
data = [1,2,2,3,3,3,4,4,5]
ref_data = [3,4,4,5,5,5,6,6,7]
p_value = p.permutationtest(data, ref_data)

And here the hypothesis can be rejected as the p-value is lower than the significance level

Just one last thing, when should we prefer Permutation testing over Hypothesis testing

Assumptions are not valid
The dataset is smaller as Permutation testing is computationally heavy
Dealing with datasets having complex forms like Rank or Categories

With this, its a wrap

Checkout the video explanations below

Permutation testing explained with an example

Moving beyond hypothesis testing

How does permutation testing work?

Written by Mehul Gupta