Getting Started with Seaborn

Published in

Beer&Diapers.ai

9 min readMay 21, 2019

Distribution Plots

distplot
jointplot
pairplot
rugplot
kdeplot

import seaborn as sns
# To show the graphs within the notebook
%matplotlib inlinetips=sns.load_dataset('tips')
tips.head()

DistPlot

The distplot shows the distribution of a univariate set of observations.

sns.distplot(tips['total_bill'])<matplotlib.axes._subplots.AxesSubplot at 0x2058a0d8198>

# To remove the KDE set kde parameter to false and to set bins set value of bins accordingly
sns.distplot(tips['total_bill'],kde=False,bins=20)<matplotlib.axes._subplots.AxesSubplot at 0x2058a3c1ba8>

Jointplot

jointplot() allows you to basically match up two distplots for bivariate data

# Various kind paramaters scatter , reg, resid, kde, hex
sns.jointplot(x='total_bill',y='tip',data=tips, kind='scatter')
sns.jointplot(x='total_bill',y='tip',data=tips, kind='kde')
sns.jointplot(x='total_bill',y='tip',data=tips, kind='hex')
sns.jointplot(x='total_bill',y='tip',data=tips, kind='reg')<seaborn.axisgrid.JointGrid at 0x2058bd6f390>

pairplot

pairplot will plot pairwise relationships across an entire dataframe (for all the numerical columns) and supports a color hue argument (for categorical columns)

sns.pairplot(tips)<seaborn.axisgrid.PairGrid at 0x2058d3584e0>

sns.pairplot(tips, hue='sex',palette='coolwarm')<seaborn.axisgrid.PairGrid at 0x2058e176668>

rugplot

rugplots just draw a dash mark for every point on a univariate distribution.

sns.rugplot(tips['total_bill'])<matplotlib.axes._subplots.AxesSubplot at 0x2058eba3470>

kdeplot

kdeplots are Kernel Density Estimation plots. These KDE plots replace every single observation with a Gaussian (Normal) distribution centered around that value

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats#Create dataset
dataset = np.random.randn(25)# Create another rugplot
sns.rugplot(dataset);# Set up the x-axis for the plot
x_min = dataset.min() - 2
x_max = dataset.max() + 2# 100 equally spaced points from x_min to x_max
x_axis = np.linspace(x_min,x_max,100)# Set up the bandwidth, for info on this:
url = 'http://en.wikipedia.org/wiki/Kernel_density_estimation#Practical_estimation_of_the_bandwidth'bandwidth = ((4*dataset.std()**5)/(3*len(dataset)))**.2
# Create an empty kernel list
kernel_list = []# Plot each basis function
for data_point in dataset:    # Create a kernel for each point and append to list
    kernel = stats.norm(data_point,bandwidth).pdf(x_axis)
    kernel_list.append(kernel)    #Scale for plotting
    kernel = kernel / kernel.max()
    kernel = kernel * .4
    plt.plot(x_axis,kernel,color = 'grey',alpha=0.5)plt.ylim(0,1)(0, 1)

# To get the kde plot we can sum these basis functions.# Plot the sum of the basis function
sum_of_kde = np.sum(kernel_list,axis=0)# Plot figure
fig = plt.plot(x_axis,sum_of_kde,color='indianred')# Add the initial rugplot
sns.rugplot(dataset,c = 'indianred')# Get rid of y-tick marks
plt.yticks([])# Set title
plt.suptitle("Sum of the Basis Functions")Text(0.5, 0.98, 'Sum of the Basis Functions')

sns.kdeplot(tips['total_bill'])
sns.rugplot(tips['total_bill'])<matplotlib.axes._subplots.AxesSubplot at 0x2058fe52780>

Categorical Data Plots

factorplot
boxplot
violinplot
stripplot
swarmplot
barplot
countplot

barplot and countplotPermalink

These plots allow to get aggregate data off a categorical feature in your data. barplot is a general plot that allows you to aggregate the categorical data based off some function, by default the mean. Count plot does the aggregation at counts

sns.barplot(x='sex',y='total_bill',data=tips)<matplotlib.axes._subplots.AxesSubplot at 0x20591a0fc18>

#using estimator we can override default aggregation type
import numpy as np
sns.barplot(x='sex',y='total_bill',data=tips,estimator=np.sum)<matplotlib.axes._subplots.AxesSubplot at 0x205907a01d0>

sns.countplot(x='sex',data=tips)<matplotlib.axes._subplots.AxesSubplot at 0x20591084828>

boxplot and violinplot

boxplots and violinplots are used to shown the distribution of categorical data.

A box plot shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.

sns.boxplot(x='day',y='total_bill',data=tips, palette='rainbow')<matplotlib.axes._subplots.AxesSubplot at 0x20591c8eba8>

#To do the boxplot on entiredataframe
sns.boxplot(data=tips, palette='rainbow',orient='h')<matplotlib.axes._subplots.AxesSubplot at 0x20591bacb00>

# to add another categor add hue
sns.boxplot(x='day',y='total_bill',data=tips, palette='rainbow',hue='sex')<matplotlib.axes._subplots.AxesSubplot at 0x20591e005f8>

sns.violinplot(x='day',y='total_bill',data=tips, palette='rainbow')<matplotlib.axes._subplots.AxesSubplot at 0x20592f2b8d0>

sns.violinplot(x='day',y='total_bill',data=tips, palette='rainbow',hue='sex')<matplotlib.axes._subplots.AxesSubplot at 0x2059302ec18>

#use split to merge into 1
sns.violinplot(x="day", y="total_bill", data=tips,hue='sex',split=True,platette='set1')<matplotlib.axes._subplots.AxesSubplot at 0x20593306198>

stripplot and swarmplot

The stripplot will draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

The swarmplot is similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap. This gives a better representation of the distribution of values, although it does not scale as well to large numbers of observations (both in terms of the ability to show all the points and in terms of the computation needed to arrange them).

sns.stripplot(x="day", y="total_bill", data=tips)<matplotlib.axes._subplots.AxesSubplot at 0x20593382cc0>

sns.stripplot(x="day", y="total_bill", data=tips,jitter=True)<matplotlib.axes._subplots.AxesSubplot at 0x205933e3e10>

sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1')<matplotlib.axes._subplots.AxesSubplot at 0x20593413a58>

sns.stripplot(x="day", y="total_bill", data=tips,jitter=True,hue='sex',palette='Set1',split=True)<matplotlib.axes._subplots.AxesSubplot at 0x20593623ac8>

sns.swarmplot(x="day", y="total_bill", data=tips)<matplotlib.axes._subplots.AxesSubplot at 0x2059368ce10>

sns.swarmplot(x="day", y="total_bill",hue='sex',data=tips, palette="Set1", split=True)<matplotlib.axes._subplots.AxesSubplot at 0x2059373d860>

#Combining Categorical Plots
sns.violinplot(x="tip", y="day", data=tips,palette='rainbow')
sns.swarmplot(x="tip", y="day", data=tips,color='black',size=3)<matplotlib.axes._subplots.AxesSubplot at 0x20594785ef0>

factorplot

factorplot is the most general form of a categorical plot. It can take in a kind parameter to adjust the plot type:

sns.factorplot(x='sex',y='total_bill',data=tips,kind='bar')
sns.factorplot(x='sex',y='total_bill',data=tips,kind='box')
sns.factorplot(x='sex',y='total_bill',data=tips,kind='violin')<seaborn.axisgrid.FacetGrid at 0x20594949828>

Matrix Plots

Matrix plots allow you to plot data as color-encoded matrices and can also be used to indicate clusters within the data

flights = sns.load_dataset('flights')
flights.head()

Heatmap

# For Heatmap to work we need to convert the data in matrix form using corr fn or pivoting the data
# Matrix form for correlation data
tp=tips.corr()sns.heatmap(tp)<matplotlib.axes._subplots.AxesSubplot at 0x205916c6400>

#use annot to show labels
sns.heatmap(tips.corr(),cmap='coolwarm',annot=True)<matplotlib.axes._subplots.AxesSubplot at 0x20594b43208>

flights.pivot_table(values='passengers',index='month',columns='year')

pvflights =flights.pivot_table(values='passengers',index='month',columns='year')
sns.heatmap(pvflights)<matplotlib.axes._subplots.AxesSubplot at 0x20594befa20>

# use line color and line width to improve look and feel
sns.heatmap(pvflights,cmap='magma',linecolor='white',linewidths=1)<matplotlib.axes._subplots.AxesSubplot at 0x20594de2dd8>

clustermap

The clustermap uses hierarchal clustering to produce a clustered version of the heatmap

sns.clustermap(pvflights)<seaborn.matrix.ClusterGrid at 0x20595e516d8>

sns.clustermap(pvflights,cmap='coolwarm',standard_scale=1)<seaborn.matrix.ClusterGrid at 0x20595f35e80>

Regression Plots

lmplot allows you to display linear models, but it also conveniently allows you to split up those plots based off of features, as well as coloring the hue based off of features.

sns.lmplot(x='total_bill',y='tip',data=tips)<seaborn.axisgrid.FacetGrid at 0x20596664208>

#adding another category
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex')<seaborn.axisgrid.FacetGrid at 0x205968690b8>

sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm')<seaborn.axisgrid.FacetGrid at 0x205968e1780>

# specify markers to distinguish ,
sns.lmplot(x='total_bill',y='tip',data=tips,hue='sex',palette='coolwarm',markers=['o','v'],scatter_kws={'s':100})<seaborn.axisgrid.FacetGrid at 0x205969c03c8>

## Using a Grid
sns.lmplot(x='total_bill',y='tip',data=tips,col='sex')<seaborn.axisgrid.FacetGrid at 0x20596ae3c18>

#provide row and column to lmplot
sns.lmplot(x="total_bill", y="tip", row="sex", col="time",data=tips)<seaborn.axisgrid.FacetGrid at 0x20597ef0da0>

# plot for different days
sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm')<seaborn.axisgrid.FacetGrid at 0x20598122860>

#adding aspect and size
sns.lmplot(x='total_bill',y='tip',data=tips,col='day',hue='sex',palette='coolwarm',
          aspect=0.6,size=8)<seaborn.axisgrid.FacetGrid at 0x20598b19780>

Grids

Grids are general types of plots that allow you to map plot types to rows and columns of a grid, this helps you create similar plots separated by features.

iris = sns.load_dataset('iris')
iris.head()

PairGrid

Pairgrid is a subplot grid for plotting pairwise relationships in a dataset.

# Just the Grid
sns.PairGrid(iris)<seaborn.axisgrid.PairGrid at 0x2059ae0b940>

# Then you map to the grid
g = sns.PairGrid(iris)
g.map(plt.scatter)<seaborn.axisgrid.PairGrid at 0x2059be1ac50>

# Map to upper,lower, and diagonal
g = sns.PairGrid(iris)
g.map_diag(plt.hist)
g.map_upper(plt.scatter)
g.map_lower(sns.kdeplot)<seaborn.axisgrid.PairGrid at 0x2059d5ff0b8>

pairplotPermalink

pairplot is a simpler version of PairGrid

sns.pairplot(iris)<seaborn.axisgrid.PairGrid at 0x20599c9beb8>

sns.pairplot(iris,hue='species',palette='rainbow')<seaborn.axisgrid.PairGrid at 0x2059e493b38>

Facet Grid

FacetGrid is the general way to create grids of plots based off of a feature:

# Just the Grid
g = sns.FacetGrid(tips, col="time", row="smoker")

g = sns.FacetGrid(tips, col="time",  row="smoker")
g = g.map(plt.hist, "total_bill")

g = sns.FacetGrid(tips, col="time",  row="smoker",hue='sex')
# Notice hwo the arguments come after plt.scatter call
g = g.map(plt.scatter, "total_bill", "tip").add_legend()

JointGrid

JointGrid is the general version for jointplot() type grids, for a quick example:

g = sns.JointGrid(x="total_bill", y="tip", data=tips)

g = sns.JointGrid(x="total_bill", y="tip", data=tips)
g = g.plot(sns.regplot, sns.distplot)

#style and colorsns.set_style('white')
sns.countplot(x='sex',data=tips,palette='deep')
sns.despine()
sns.despine(left=True)