# Beautiful Boxplots With Statistical Significance Annotation

## Super short tutorial for Boxplots With Significance Annotation in Python

# Introduction & Motivation

Back then, I remember myself reading some nice scientific publications where the authors would have some nice boxplots. In most of these cases, a statistical test had been used to determine whether there was a statistically significant difference in the mean value of a specific feature between different groups.

I have now managed to create some custom python code to do exactly this: produce beautiful boxplots with statistical annotations integrated. In this short article, I just show how to create such beautiful boxplots in Python.

# The dataset

We will use the Iris Dataset as we have done in all my previous posts. The dataset contains **four** **features** (length and width of sepals and petals) of **50** samples of **three** **species** of Iris (Iris **setosa**, Iris **virginica** and Iris **versicolor**). The dataset is often used in data mining, classification and clustering examples and to test algorithms.

For reference, here are pictures of the **three flowers species**:

For this short tutorial, we will be only using **2 **out of the 3 classes i.e. the **setosa** and **versicolor** classes. *This is done only for the sake of simplicity.*

# Working example in Python

**Step 1**: Let’s **load the data** and sub-select the desired **2 flower classes**:

from sklearn.datasets import load_iris

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

import numpy as np# Load the Iris dataset

X = load_iris().data

y = load_iris().target

feature_names = load_iris().feature_names

classes_names = load_iris().target_names# Use only 2 classes for this example

mask = y!=2

X,y = X[mask,:], y[mask]# Get the remained class names

classes_names[[0,1]]

# array(['setosa', 'versicolor'], dtype='<U10')

**Step 2**:**We have now selected all the samples for the 2 classes: setosa & versicolor **flower classes**.** We will put the data into a `panda`

dataframe to make our lives easier:

df = pd.DataFrame(X,columns=feature_names)

df['Group'] = [i for i in y]

df_long = pd.melt(df, 'Group', var_name='Feature', value_name='Value') # this is needed for the boxplots later ondf.head()

**Step 3**:**Let’s inspect the dataframe:**

As we can see, we have 4 features and the last column denote the group membership of the corresponding sample.

## The statistical tests

**Step 4**: **Now it’s time to do the statistical tests. **We will use a **two-sample t-test** (since our group are independent) to **test** **if** the **mean value of any of these 4 features** (i.e. sepal length, sepal width, petal length, petal width) is **statistically** **different** **between** the 2 **groups** of **flowers** (**setosa** and **versicolor**).

`#* Statistical tests for differences in the features across groups`

from scipy import stats

all_t = list()

all_p = list()

for case in range(len(feature_names)):

sub_df = df_long[df_long.Feature == feature_names[case]]

g1 = sub_df[sub_df['Group'] == 0]['Value'].values

g2 = sub_df[sub_df['Group'] == 1]['Value'].values

t, p = stats.ttest_ind(g1, g2)

all_t.append(t)

all_p.append(p)

To do the statistical test we just used:

`t, p = stats.ttest_ind(g1, g2)`

Here we compare the mean of g1 (group 1: setosa) to the mean of g2 (group 2: versicolor) and **we do that for all 4 features** (using the for loop).

But how can we know if the mean of g1 (group 1: setosa) was significantly greater or smaller than the mean of g2 (group 2: versicolor) ?

For this we need to look at the** t-values.**

print(all_t)

[-10.52098626754911, 9.454975848128596, -39.492719391538095, -34.08034154357719]print(feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

**Interpretation:**

**If the t-value is***positive*(>0) then the mean of g1 (group 1: setosa) was significantly*greater*than the mean of g2 (group 2: versicolor).**If the t-value is***negative*(<0) then the mean of g1 (group 1: setosa) was significantly*smaller*than the mean of g2 (group 2: versicolor).

**Reminder**: *feature_names = [‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’].*

- We can
**conclude**that**only the mean value of***sepal width*of g1 (setosa) was statistically greater that the mean value of*sepal width*of g2 (versicolor).

**Step 5**: **Check the t-test results**

`print(np.count_nonzero(np.array(feature_names)[np.array(all_p) < 0.05]))`

# 4

**Interpretation: **We can see that there is a statistically significant difference in **all 4 features **between **setosa** and **versicolor** classes.

**Step 6**: Here is the magic. Let’s create some **beautiful** **boxplots** and **annotate** them with the **estimated** **statistical** **significance**.

# renaming so that class 0 will appear as setosa and class 1 as versicolor

df_long.loc[df_long.Group==0, 'Group'] = classes_names[0]

df_long.loc[df_long.Group==1, 'Group'] = classes_names[1]# Boxplots

fig, axes = plt.subplots(2,2, figsize=(14,10), dpi=100)

axes = axes.flatten()for idx, feature in enumerate(feature_names):

ax = sns.boxplot(x=”Feature”, hue=”Group”, y=”Value”, data = df_long[df_long.Feature == feature], linewidth=2, showmeans=True, meanprops={“marker”:”*”,”markerfacecolor”:”white”, “markeredgecolor”:”black”}, ax=axes[idx])

#* tick params

axes[idx].set_xticklabels([str(feature)], rotation=0)

axes[idx].set(xlabel=None)

axes[idx].set(ylabel=None)

axes[idx].grid(alpha=0.5)

axes[idx].legend(loc=”lower right”, prop={‘size’: 11})

#*set edge color = black

for b in range(len(ax.artists)):

ax.artists[b].set_edgecolor(‘black’)

ax.artists[b].set_alpha(0.8)

#* statistical tests

x1, x2 = -0.20, 0.20

y, h, col = df_long[df_long.Feature == feature][“Value”].max()+1, 2, ‘k’

axes[idx].plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)

axes[idx].text((x1+x2)*.5, y+h, “statistically significant”, ha=’center’, va=’bottom’, color=col)fig.suptitle("Significant feature differences between setosa and versicolor classes/groups", size=14, y=0.93)

plt.show()

# Conclusions

As we can see from the statistical tests, we can **conclude** that **only the mean value of sepal width of group 1 (setosa) was statistically greater that the mean value of sepal width of group 2 (versicolor).**

On the other hand, the mean value of **sepal length, petal length and petal width**

*of the*

*Setosa**group*was

**statistically**

**smaller**that the

**mean**

**value**of the

**Versicolor**group.

*These observations can be also verified by looking at boxplots.*

That’s all folks ! Hope you liked this article!

# Stay tuned & support this effort

If you liked and found this article useful, **follow** me to be able to see all my new posts.

Questions? Post them as a comment and I will reply as soon as possible.

# Latest posts

# Get in touch with me

**LinkedIn**: https://www.linkedin.com/in/serafeim-loukas/**ResearchGate**: https://www.researchgate.net/profile/Serafeim_Loukas**EPFL****profile**: https://people.epfl.ch/serafeim.loukas**Stack****Overflow**: https://stackoverflow.com/users/5025009/seralouk