Box Plot with Seaborn

Arsalan Zafar
4 min readNov 2, 2022

--

Pixabay License Free for commercial use
No attribution required

What are we going to learn today?

In this article, we will learn how to create a box plot using Seaborn. A box plot is a type of chart that is often used in exploratory data analysis.

It is a standard method of visualizing data distribution and it uses a five-number summary — the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and “maximum”.

Box plot also tells about skewness and outliers in the data.

Why is it important to learn?

Few reasons to learn box plot are:

  • Box plot can show several statistical measures in a compact form.
  • It can help detect outliers in data.
  • It can help determine the symmetry and skewness of the data. A

How can we achieve today’s goal?

The plan for today is:

  • Create a box plot using Python Seaborn
  • Fix the values on the x-axis according to the data
  • Box plot with a categorical variable
  • Using hue parameter
  • Box plot of each numerical in the data set
  • Adding jitter
  • Conclusion

Let’s import the required library.

Input

import pandas as pdimport numpy as npfrom matplotlib import pyplot as pltimport seaborn as sns

We will use the tips and iris dataset for this article. You can load these datasets directly from Seaborn.

Let’s load the datasets using the load_dataset method in Seaborn.

Input

tips = sns.load_dataset(‘tips’)iris = sns.load_dataset(‘iris’)

Create a box plot using Python Seaborn

To visualize a box plot, we will use the Seaborn method boxplot in which two parameters will be passed x and data.

Input

sns.boxplot(x=’total_bill’, data=tips);

Output

In the above plot, we visualize a box plot for total_bill. It shows the 5-number descriptive statistics summary using a box plot.

Fix the values on the x-axis according to the data

But the value on the x-axis having step is 10. We can change this with the help of the matplotlib xticks function.

Input

sns.boxplot(x=’total_bill’, data=tips);plt.xticks(np.arange(1,55,3)); # np.arange(start, stop, step)

Output

We can visualize a box plot using categorical variables. It will create a box plot for each day in the data set.

Box plot with a categorical variable

Let’s pass the day column on the x parameter and the total_bill column on y.

Input

sns.boxplot(x=’day’, y=’total_bill’, data=tips);

Output

Using the hue parameter

If you want more in-depth knowledge from the data set, you can use the hue parameter, which we discussed in the bar charts.

If we set hue equal to sex (column) in the same plot above, it will also show the box plot of each day for males and females separately.

Input

sns.boxplot(x=’day’, y=’total_bill’, hue=’sex’, data=tips);

Output

Box plot of each numerical in the data set

If we give the whole data set to a box plot, it will create a box plot of each numerical column present in the data set.

In iris data set there are four numerical columns named sepal_length, sepal_width, petal_length, _petal_width.

Input

sns.boxplot(data=iris);

Output

Adding Jitter

If we see the box plot of Friday, it seems that Friday has higher or equal values than Thursday, but this is not the case because the box plot only shows the summary.

If you want to see the amount of data you are working on, adding jitter to the plot can make the plot more insightful.

Input

sns.boxplot(x=’day’, y=’total_bill’, data=tips);

Output

We will use the stripplot function from Seaborn to add a jitter on the box plot.

Input

sns.boxplot(x=’day’, y=’total_bill’, data=tips);sns.stripplot(x=’day’, y=’total_bill’, data=tips, color=’black’, jitter=0.2);

Output

And now you can see new patterns. Before making any assumption that Friday has more or fewer values than the others, it is visible that Friday has a small sample size compared to others.

Conclusion

This article covered the box plot with real-world datasets. Thank you for reading hope you found it helpful. Check out the rest of my articles here.

--

--