Data Visualisation Using Seaborn


Created on Jupyter Notebook Using Tips Dataset

Seaborn is a data visualisation library that helps in creating fancy data visualisations in Python. Most of the Data Analysis requires identifying trends and building models. This article will help you get started creating data visualisation using Seaborn library.

Seaborn is a library that helps in build us awesome plots and make our life easy. In order to begin with you should type the following command in your jupyter.

import pandas as pd # Pandas
import numpy as np # Numpy
import matplotlib.pyplot as plt # Matplotlibrary
import seaborn as sns # Seaborn Library
%matplotlib inline

These codes written above would be used to import libraries in python.

Distplot

sns.distplot(data[“variablename”])

sns.distplot() combines the histogram & plots the estimated probability density function over the data. The calculation of bin size is automatic here. The syntax of the plot is shown above.

It is applicable on numerical columns only as it creates histogram along with the plot of kernel density estimation.

We will be using the following code to load the dataset in jupyter notebook.

# Load the Dataset in Python
tips = sns.load_dataset("tips")
tips.head()

Now, since we have uploaded the dataset, we will create the first plot using “total_bill”variable. Let’s create the distplot of “total_bill” variable from the tips data.

sns.distplot(tips["total_bill"], bins=16, color="purple")
# Binsize is calculated using square-root of row count.

Lets understand the code written above.

sns.distplot — this command will initiate the creation of distplot.

tips[“total_bill”] — Pull the column (total_bill) from the tips dataset (dataframe). Here, we should observe one thing that the column value can be pulled using square bracket and the column name should be in quotes (double/single quotes both are accepted.

Inference — The “total_bill” variable appears skewed in nature and most of the bill values are in the range of $10 — $20.

Jointplot

A Joint-Plot takes two variables and creates Histogram and Scatterplot together. Lets look into the syntax of the jointplot.

sns.jointplot(x = , y =, data=)

Lets create a jointplot of total_bill and tip variable from the tips dataset. Normally the tip amount in any restaurant is dependant on the total bill/bill size. Lets see what this plot appears like. Here is the code for the same.

sns.jointplot(x = "total_bill", y = "tip", data = tips, color="purple")

As mentioned above, the scatterplot appears to be showing the strong correlation between the total bill & the tip amount. On the top of it, we can see the histogram of the respective variables.

Jointplot :: kind =”hex”

The bivariate analogue of a histogram is known as a “hexbin” plot, because it shows the counts of observations that fall within hexagonal bins. This plot works best with relatively large datasets. Also known as Hexbin Plots.

sns.jointplot(x = , y =, data=, kind=”hex”)
# Jointplot - Scatterplot and Histogram
sns.jointplot(x = "total_bill", y = "tip", data = tips, kind ="hex",
color="lightcoral")

There are several kind values that can be put in the sns.jointplot to create different plots. By default the joint-plot shows scatterplot. Now, in the plot image above, it is showing hexagons. The dark colour of the hexagons suggest high density of the data-points where the lighter colour signifies the lesser points.

The values that can be put into the kind parameter of the jointplot are as follows

# kind : { "scatter" | "reg" | "resid" | "kde" | "hex" }

So, now in sns.joinplot we have seen the scatterplots and have also seen the hexagon shapes. Now, we will see the “kde” as type.

Jointplot :: kind =”kde”

# Jointplot - Scatterplot and Histogram
sns.jointplot(x = tips["total_bill"], y = tips["tip"],kind = "kde", 
color="purple") # contour plot

The plot showing above is called a Contour Plot. A Contour plot (sometimes called Level Plots) are a way to show a three-dimensional surface on a two-dimensional plane. It graphs two predictor variables X Y on the y-axis and a response variable Z as contours.

Pairplot

A Pairplot essentially plots pair wise relationship between variables. It supports “hue” as coloring the plot using categorical variables.

sns.pairplot(“dataframe”)
# Pairplot of Tips
sns.pairplot(tips, hue = "sex", palette="Set2")
# this  will color the plot gender wise

Lets understand the Pairplot. The diagonal part shows the distplot or histogram with kernel density estimation. The upper and lower part of the Pairplot shows the scatterplot. The “hue” colours the plot using categorical columns.

hue = “sex” — It will color the plot gender wise.

palette = “Set2” is a type of color combination used to color the plot. More details about the palette can see be seen under the link — Click here to visit the Link.

Barplot

Barplots are meant for plotting categorical columns vs numerical columns. It creates bar in visualisation. Lets create a barplot of “total_bill” with “sex” and let’s see who pays more Males or Females.

sns.barplot(x = , y =, data=)
# Barplot
sns.barplot(x ="sex" , y ="total_bill" , data=tips)
# Inference - Total Bill Amount for males is more than Females.
# Lets Plot Smoker Vs Total Bill :: The purpose is to find out if 
# Smokers pay more bill than Non Smokers
sns.barplot(x = "smoker", y = "total_bill", data =tips)
# Inference - More Bill for Smokers
# Lets Find If There is more Bill In Weekend or Weekdays
sns.barplot(x = "day", y = "total_bill", data =tips)
# People tend to visit more on weekends

Boxplot

A Box Plot is a visual representation of five point summary statistics of a given data set. A five number summary includes:

  • Minimum
  • First Quartile
  • Median (Second Quartile)
  • Third Quartile
  • Maximum

Also, a point worth noticing is that a boxplot is created for Categorical — Continuous Variables which means that if the x -axis is categorical and y axis is continuous then a boxplot or a violin plot should be created.

Lets create a boxplot of “day” & “total_bill” from the tips dataset. Here is the syntax for the same.

sns.boxplot(x = , y =, data=)
# Boxplot
sns.boxplot(x = "day", y = "total_bill", data=tips)
# Add hue to split the barplot. Making it more fancier
sns.boxplot(x = "day", y = "total_bill", data=tips, hue = "smoker")
# On Friday people have more bill if they are a Non smoker vs smoker

hue = “smoker”: — It has created a boxplot for smokers & non smokers. For e.g. in the case of Friday, its clearly seen that food bill is more in the case of non smoker when compared to smokers on that day.

# Violin Plots
sns.violinplot(x = "day", y = "total_bill", data = tips)

The violin plots are similar to the boxplots. The same can be seen in the image below.

LM Plot

sns.lmplot is a plot that fits the regression line to the dataset showing as scatterplots. It follows the ordinary least square method and the line represents the best fit line. One must read a little bit about linear regression in order to understand this better.

Here is the code of the lmplot

# LM PLot
sns.lmplot(x = "total_bill", y = "tip", data = tips, hue="day")

This shows the linear regression fit of total_bill variable for the different days as shown in the plot legend. This is obtained using hue = “day” in sns.lmplot.

Congratulations!! You have finished the Seaborn Tutorial for Beginners. Hope this article would have provided a basic understanding around Seaborn and would have helped you in creating all these plots.

Please share your views and suggestions to make it better.