An Ultimate Cheat Sheet for Data Visualization Techniques in Seaborn

Vildansarikaya
Clarusway
Published in
5 min readDec 26, 2020

Data visualization signs to the techniques used to information by encoding it as visual objects such as bars, points and lines in graphics. The goal is to communicate data and information to users in a clear and efficient manner. Data visualization is helpful to understand the structure of the data, determine the outlier, see the trends or patterns of the data, evaluate the results, and communicate these findings effectively to audience. This is one of the steps in data analysis, and this step is called with Exploratory Data Analysis (EDA).

Python provides many visualization packages for EDA. One of them is Seaborn library. Seaborn library makes statistical graphics in python, and it helps users explore and understand the data with graphics.

Here will be explained Seaborn plot types, how to plot with real data, when to use them, and finally provide a brand-new, broadly explained cheat sheet.

Let’s start with importing required libraries and loading the dataset which contains some built-in datasets in seaborn. The “tips” dataset, one of built-in datasets, will be used.

1.DISTRIBUTION PLOT

A)Distplot: A distplot is used a univariate distribution of observation.

tips[‘total_bill’]: It is a numeric column in tips dataset.

hist_kws: The keyword argument to change histogram format.

bins: It is used to set the number of bins, and it depends on your datasets.

kde: Kernel Density Estimate.

2.CATEGORICAL PLOT

A categorical variable has two or more categories, nonnumeric values. For instance, “sex” column is a categorical variable column, having two categories (male and female) at “tips” dataset. “day” column is a categorical variable column, having a number of categories (thursday, friday, saturday, sunday, etc.) at “tips” dataset. Categorical plot can be used to visualize categorical variables.

A)Bar Plot: Bar plot is useful for displaying relationships between categorical variable and numerical variable. Bar plot has two methods. Dataset’s columns can be used or dataset’s columns assigned to x and y parameters are used.

Method 1

Method 2

hue: determines which column in the data frame should be used for colour encoding.

B)Count Plot: Count plot shows the counts of observations in each categorical variable.

C)Box Plot: Box plot visualizes summarizing numeric data over a set of categorical variables, and it provides some information about data. This information consists of minimum score, maximum score, first quartile (25% of data), second quartile-median (50% of data), and third quartile (75% of data).

D)Violin Plot: Violin plot is a similar role as a box plot.

E)Strip Plot: Strip plot is a scatter plot. It shows the relationship between two variables.

F)Swarm Plot(Violin Plot + Strip Plot): As the name suggests, it consists of the combination of a violin plot and a strip plot.

G)Cat Plot(Former Name: Factor Plot): This plot provides the relationship between a numerical and one or more categorical variables with several visual options.

H)Point Plot: Point plot shows an estimate of central tendency(center of the data distribution) for a numerical variable.

3.MATRIX AND GRID PLOT

A)Heat Map: The heat map shows which variables are correlated to each other. It helps to understand correlation, and it provides a colorful visualization about correlation.

B)Pair Plot: The pair plot shows pairwise relationships in a dataset.

C)FacetGrid: FacetGrid helps to understand distribution of one variable as well as the relationship between multiple variables with using separate subsets.

4)JOINTPLOT

Jointplot visualizes two variables with bivariate and univariate graphs.

5)LMPLOT

Lmplot can be used for drawing a scatter plot onto a FacetGrid.

To sum up:

  • Seaborn is a Python data visualization library built on Matplotlib.
  • It provides a high-level interface for drawing attractive and informative statistical graphics.
  • Seaborn is helpful to understand structure of the data, determine the outliers, and understand many things through visualization.
  • Seaborn has powerful data visualization capabilities for EDA. All charts mentioned in the article can be used for EDA.
  • This cheat sheet is a useful reference where you can find all the examples in the article and easily access them whenever you want.

You can download the cheat sheet containing all the codes from here.

--

--