Beautiful Boxplots With Statistical Significance Annotation

Super short tutorial for Boxplots With Significance Annotation in Python

Serafeim Loukas
Jun 21, 2020 · 6 min read
Image for post
Image for post
Figure produced by the author.

Introduction & Motivation

The dataset

Image for post
Image for post
Figure made by the author.

Working example in Python

from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Load the Iris dataset
X = load_iris().data
y = load_iris().target
feature_names = load_iris().feature_names
classes_names = load_iris().target_names
# Use only 2 classes for this example
mask = y!=2
X,y = X[mask,:], y[mask]
# Get the remained class names
classes_names[[0,1]]
# array(['setosa', 'versicolor'], dtype='<U10')
df = pd.DataFrame(X,columns=feature_names)
df['Group'] = [i for i in y]
df_long = pd.melt(df, 'Group', var_name='Feature', value_name='Value') # this is needed for the boxplots later on
df.head()
Image for post
Image for post

The statistical tests

#* Statistical tests for differences in the features across groups
from scipy import stats
all_t = list()
all_p = list()
for case in range(len(feature_names)):
sub_df = df_long[df_long.Feature == feature_names[case]]
g1 = sub_df[sub_df['Group'] == 0]['Value'].values
g2 = sub_df[sub_df['Group'] == 1]['Value'].values
t, p = stats.ttest_ind(g1, g2)
all_t.append(t)
all_p.append(p)
t, p = stats.ttest_ind(g1, g2)

But how can we know if the mean of g1 (group 1: setosa) was significantly greater or smaller than the mean of g2 (group 2: versicolor) ?

print(all_t)
[-10.52098626754911, 9.454975848128596, -39.492719391538095, -34.08034154357719]
print(feature_names)
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
print(np.count_nonzero(np.array(feature_names)[np.array(all_p) < 0.05]))
# 4
# renaming so that class 0 will appear as setosa and class 1 as versicolor
df_long.loc[df_long.Group==0, 'Group'] = classes_names[0]
df_long.loc[df_long.Group==1, 'Group'] = classes_names[1]
# Boxplots
fig, axes = plt.subplots(2,2, figsize=(14,10), dpi=100)
axes = axes.flatten()
for idx, feature in enumerate(feature_names):
ax = sns.boxplot(x=”Feature”, hue=”Group”, y=”Value”, data = df_long[df_long.Feature == feature], linewidth=2, showmeans=True, meanprops={“marker”:”*”,”markerfacecolor”:”white”, “markeredgecolor”:”black”}, ax=axes[idx])
#* tick params
axes[idx].set_xticklabels([str(feature)], rotation=0)
axes[idx].set(xlabel=None)
axes[idx].set(ylabel=None)
axes[idx].grid(alpha=0.5)
axes[idx].legend(loc=”lower right”, prop={‘size’: 11})

#*set edge color = black
for b in range(len(ax.artists)):
ax.artists[b].set_edgecolor(‘black’)
ax.artists[b].set_alpha(0.8)

#* statistical tests
x1, x2 = -0.20, 0.20
y, h, col = df_long[df_long.Feature == feature][“Value”].max()+1, 2, ‘k’
axes[idx].plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col)
axes[idx].text((x1+x2)*.5, y+h, “statistically significant”, ha=’center’, va=’bottom’, color=col)
fig.suptitle("Significant feature differences between setosa and versicolor classes/groups", size=14, y=0.93)
plt.show()
Image for post
Image for post
Figure produced by the author.

Conclusions

Stay tuned & support this effort

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Serafeim Loukas

Written by

Diploma of Electrical & Computer Engineering (NTUA). Master of Science in Neuroscience (UNIGE). Currently, I am a PhD student at EPFL.

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Serafeim Loukas

Written by

Diploma of Electrical & Computer Engineering (NTUA). Master of Science in Neuroscience (UNIGE). Currently, I am a PhD student at EPFL.

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store