Python’s best visualization library — Seaborn

Andrew Schleiss
Geek Culture
Published in
5 min readSep 20, 2021

Python has many ways to create beautiful visuals, but the simplest and most effective library to explore data with beautiful graphs has to be Seaborn.

pairplot on the titanic dataset

If you want to quickly get going, here’s a Seaborn cheat sheet from Datacamp :

Source Datacamp

As we can see in the cheat sheet, there are “categories” of graphs that we can create, depending on our data types and what we are trying to analyze:

  • Regression & linear — regplot, lineplot, lmplot
  • Distribution — distplot , histplot
  • Categorical —barplot, boxplot, violinplot, scatterplot, countplot
  • Matrix — heatplot, clusterplot

Let’s select a few graphs from each section and see if we find any interesting analyses from our 2 datasets:

  • Titanic — Passenger information on the titanic, focusing on the survival
  • Tips — Customer information from a restaurant focusing on the tip amount

Let’s install seaborn and setup our Jupyter notebook with the required libraries to get started

Setup

  1. Install Seaborn via Windows command line or Anaconda shell
Command line or Anaconda Shell

2. Load our Jupiter notebook — import pandas, seaborn libraries and our two example datasets

import seaborn as sns
import pandas as pd
titanic = sns.load_dataset("titanic")
tips = sns.load_dataset("tips")

Check the first 5 rows of our data

Seaborn plots

With most of our plots, the parameters required are generally in the format:

sns.plottype( data =Dataframe , x =columnname ,y =columnname )

Regression and line plots

regplot

We start with a regression plot (regplot) and analyze the tips data so see any correlation between the number of tips and the bill amount

sns.regplot(data = tips, x = "total_bill",y = "tip" )
Regression plot (regplot)

From the regplot we note that there is indeed a correlation here, where the greater the total bill the greater the tip, however regplot cant give us any further breakdown if we want to include a 3rd variable.

For this lets try lmplot() and add a “smoker” as a hue which will create plot 2 regression graphs on one axis

sns.lmplot(data = tips, x = “total_bill”,y = “tip” , hue = “smoker”)
Regression plot (lmplot)

Fantastic! We can see that non-smokers (vs smokers) tip more as the bill increases

Distribution

For a distribution or histogram plots, we only need an x-value which will be bucketed (or binned) into set ranges and counted

displot

A good distribution plot is displot where we can analyze the “age” distribution on the titanic. We can then include “sex” as a hue see the difference between male and female ages

sns.displot(data = titanic , x = "age" , hue = "sex", hue_order = ['female', 'male'])
Distribution plot (displot)

The displot shows that most of the people onboard are aged around 18–35, and with the hue we can see that there are more males onboard (vs females) and have a higher number of elderly males

We also changed the hue_order here to bring the female color to the front and males second (try it without the hue_order and see the default order)

Categorical

For categorical plots we have quite a few, so lets show some keys ones that I use regularly

boxplot

sns.boxplot(data = titanic, x = "sex", y= "age")
Categorical (boxplot)

The boxplot shows some similar information to the displot, however we can also see some outliers for male ages (around 68+)

violinplot

Lets see if we can enhance the boxplot with one of my favorite plots, the violin plot

sns.violinplot(data = titanic, x = "sex", y="age")
Categorical (violinplot)

Alright! now we can see some density of ages, but what if we add another column “survived” as the hue

sns.violinplot(data = titanic, x = “sex”, y=”age”,hue = “survived”, split=True)
Categorical (violinplot with hue and split)

Ok this is quite a lot in one graph to lets break it down, we added “survived” to the hue and split (split = True) the plot down the middle to compare this better.

There is a lot to analyze but a quick glance shows that younger males look to have a higher change of surviving

Matrix

heatmap

My most useful plot in the matrix list is the heatmap, specifically when looking at correlation matrices.

Note: We can quickly create a correlation matrix using corr()

titanic.corr().head()
Correlation matrix of titanic dataset

We can now create a heatmap of these correlations

sns.heatmap(data = titanic.corr())
Matrix (heatmap)

This looks pretty good but we can make this better:

  • Set our values to be between 0 ->1 so we can better see the min and max (vmin/vmax)
  • Annotate (annot) the values to be inside the blocks
  • Change the color palette to see the color changes better (cmap), you can find the different color palettes on the Seaborn website
sns.heatmap(data = titanic.corr(), vmin = -1, vmax = 1, annot=True , cmap = "Spectral")
Matrix (heatmap with annot and cmap)

From this heatmap we see a few small positive and negative correlations but some keys ones are the positive correlation between alone and adult_male and the negative correlation between adult_male and survived

Conclusion

We only highlighted a few plots available in Seaborn but hopefully examples in this article will give you a good base to get started.

Once you feel more comfortable with Seaborn have a look at their gallery and the tutorial section to better customize your graphs

All code and data is available on our github

--

--