Python’s best visualization library — Seaborn

Published in

Geek Culture

5 min readSep 20, 2021

Python has many ways to create beautiful visuals, but the simplest and most effective library to explore data with beautiful graphs has to be Seaborn.

If you want to quickly get going, here’s a Seaborn cheat sheet from Datacamp :

As we can see in the cheat sheet, there are “categories” of graphs that we can create, depending on our data types and what we are trying to analyze:

Regression & linear — regplot, lineplot, lmplot
Distribution — distplot , histplot
Categorical —barplot, boxplot, violinplot, scatterplot, countplot
Matrix — heatplot, clusterplot

Let’s select a few graphs from each section and see if we find any interesting analyses from our 2 datasets:

Titanic — Passenger information on the titanic, focusing on the survival
Tips — Customer information from a restaurant focusing on the tip amount

Let’s install seaborn and setup our Jupyter notebook with the required libraries to get started

Setup

Install Seaborn via Windows command line or Anaconda shell

2. Load our Jupiter notebook — import pandas, seaborn libraries and our two example datasets

import seaborn as sns
import pandas as pdtitanic = sns.load_dataset("titanic")
tips = sns.load_dataset("tips")

Check the first 5 rows of our data

Seaborn plots

With most of our plots, the parameters required are generally in the format:

sns.plottype( data =Dataframe , x =columnname ,y =columnname )

Regression and line plots

regplot

We start with a regression plot (regplot) and analyze the tips data so see any correlation between the number of tips and the bill amount

sns.regplot(data = tips, x = "total_bill",y = "tip" )

From the regplot we note that there is indeed a correlation here, where the greater the total bill the greater the tip, however regplot cant give us any further breakdown if we want to include a 3rd variable.

For this lets try lmplot() and add a “smoker” as a hue which will create plot 2 regression graphs on one axis

sns.lmplot(data = tips, x = “total_bill”,y = “tip” , hue = “smoker”)

Fantastic! We can see that non-smokers (vs smokers) tip more as the bill increases

Distribution

For a distribution or histogram plots, we only need an x-value which will be bucketed (or binned) into set ranges and counted

displot

A good distribution plot is displot where we can analyze the “age” distribution on the titanic. We can then include “sex” as a hue see the difference between male and female ages

sns.displot(data = titanic , x = "age" , hue = "sex", hue_order = ['female', 'male'])

The displot shows that most of the people onboard are aged around 18–35, and with the hue we can see that there are more males onboard (vs females) and have a higher number of elderly males

We also changed the hue_order here to bring the female color to the front and males second (try it without the hue_order and see the default order)

Categorical

For categorical plots we have quite a few, so lets show some keys ones that I use regularly

boxplot

sns.boxplot(data = titanic, x = "sex", y= "age")

The boxplot shows some similar information to the displot, however we can also see some outliers for male ages (around 68+)

violinplot

Lets see if we can enhance the boxplot with one of my favorite plots, the violin plot

sns.violinplot(data = titanic, x = "sex", y="age")

Alright! now we can see some density of ages, but what if we add another column “survived” as the hue

sns.violinplot(data = titanic, x = “sex”, y=”age”,hue = “survived”, split=True)

Categorical (violinplot with hue and split)

Ok this is quite a lot in one graph to lets break it down, we added “survived” to the hue and split (split = True) the plot down the middle to compare this better.

There is a lot to analyze but a quick glance shows that younger males look to have a higher change of surviving

Matrix

heatmap

My most useful plot in the matrix list is the heatmap, specifically when looking at correlation matrices.

Note: We can quickly create a correlation matrix using corr()

titanic.corr().head()

We can now create a heatmap of these correlations

sns.heatmap(data = titanic.corr())

This looks pretty good but we can make this better:

Set our values to be between 0 ->1 so we can better see the min and max (vmin/vmax)
Annotate (annot) the values to be inside the blocks
Change the color palette to see the color changes better (cmap), you can find the different color palettes on the Seaborn website

sns.heatmap(data = titanic.corr(), vmin = -1, vmax = 1, annot=True , cmap = "Spectral")

From this heatmap we see a few small positive and negative correlations but some keys ones are the positive correlation between alone and adult_male and the negative correlation between adult_male and survived

Conclusion

We only highlighted a few plots available in Seaborn but hopefully examples in this article will give you a good base to get started.

Once you feel more comfortable with Seaborn have a look at their gallery and the tutorial section to better customize your graphs

All code and data is available on our github