Ten Must-Know Seaborn Plots
For anyone who wants to ace visualisation using Python, here are some commonly used plots with explanation of use-cases and code examples
Before we begin, lets import the following python packages required to process tabular data and create visualisations. We will also be importing a few datasets from the seaborn library:
#Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
#Import Datasets
exercise = sns.load_dataset('exercise')
iris = sns.load_dataset('iris')
penguins = sns.load_dataset('penguins')
mpg = sns.load_dataset('mpg')
titanic = sns.load_dataset('titanic')
tips = sns.load_dataset('tips')
The ten visualisations discussed in this article are indexed below for your convenience:
1. Bar Plots
2. Count Plots
3. Histograms
4. Cat Plots (Box, Violin, Swarm, Boxen)
5. Multiple Plots using FacetGrid
6. Joint Plots
7. KDE Plots
8. Pairplots
9. Heatmaps
10. Scatter Plots
1. Bar Plots
Bar plots can be used to visualize various types of data, such as counts, frequencies, percentages, or averages. They are particularly useful for displaying and comparing data from different categories.
Use Cases:
- Categorical Comparison: Each bar represents a distinct category, and the height of the bar indicates the aggregated value associated with that category (count, sum or mean).
For example, average age of titanic passengers by gender:
# Simple barplot
sns.barplot(data=titanic, x="who", y="age", estimator='mean',
errorbar=None, palette='viridis')
plt.title('Simple Barplot')
plt.xlabel('Person')
plt.ylabel('Average Age')
plt.show();
2. Proportional Representation through Stacked Bar Charts: Bar plots can also represent proportions or percentages. By scaling the height of each bar to represent the proportion of observations in a category, it is possible to compare the relative distribution of different categories.
For example, proportion of males from different towns on the Titanic:
#Prepare data for next plot
data = titanic.groupby('embark_town').agg({'who':'count','sex': lambda x: (x=='male').sum()}).reset_index()
data.rename(columns={'who':'total', 'sex':'male'}, inplace=True)
data.sort_values('total', inplace=True)
# Barplot Showing Part of Total
sns.set_color_codes("pastel")
sns.barplot(x="total", y="embark_town", data=data,
label="Female", color="b")
sns.set_color_codes("muted")
sns.barplot(x="male", y="embark_town", data=data,
label="Male", color="b")
plt.title('Barplot Showing Part of Total')
plt.xlabel('Number of Persons')
plt.legend(loc='upper right')
plt.show()
3. Comparison of Subcategories within each category through Clustered Bar Plots: Multiple bars can be grouped within each category to represent different subcategories, allowing for comparison and analysis.
For example, what is the average age of males and females within each class?
# Clustered barplot
sns.barplot(data=titanic, x='class', y='age', hue='sex',
estimator='mean', errorbar=None, palette='viridis')
plt.title('Clustered Barplot')
plt.xlabel('Class')
plt.ylabel('Average Age')
plt.show();
For more on bar plots, Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.barplot.html
2. Count Plots
A count plot displays the number of occurrences of each category in a categorical variable. The x-axis represents the categories of the variable, while the y-axis represents the count or frequency of each category.
Use Cases:
- Frequency Distribution of categorical variables: Each bar represents a category, and the height of the bar represents the frequency or count of observations in that category. This helps to identify the most common or least common categories.
For example, the status of passengers on the titanic
# Simple Countplot
sns.countplot(data=titanic, x='alive', palette='viridis')
plt.title('Simple Countplot')
plt.show();
- Relationship between different categorical variables
For example, the status of passengers by gender on the titanic
# Clustered Countplot
sns.countplot(data=titanic, y="who",
hue="alive", palette='viridis')
plt.title('Clustered Countplot')
plt.show();
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.countplot.html
3. Histograms
Histograms are graphical representations of the distribution of a dataset. They can reveal important characteristics of the data, such as whether it follows a normal distribution, is skewed to one side, or has multiple peaks. They display the frequency or count of observations within different intervals or “bins” of the data.
The x-axis of a histogram represents the range of values in the dataset, divided into equally spaced intervals or bins. The y-axis represents the frequency or count of observations falling within each bin. The height of each bar in the histogram corresponds to the number of observations in that bin.
Use Cases:
- Visualize the shape, centre, range and spread of a continuous/numeric variable and to identify any patterns or outliers.
For example, distribution of the width of flower petals
# Histogram with KDE
sns.histplot(data=iris, x='sepal_width', kde=True)
plt.title('Histogram with KDE')
plt.show();
2. Compare the distribution of many continuous variables
For example, Distribution of petal length and sepal length for flowers
# Histogram with multiple features
sns.histplot(data=iris[['sepal_length','sepal_width']])
plt.title('Multi-Column Histogram')
plt.show();
3. Compare the distribution of a continuous variable for different categories
For example, distribution of petal length for different species of flowers
#Stacked Histogram
sns.histplot(iris, x='sepal_length', hue='species', multiple='stack',
linewidth=0.5)
plt.title('Stacked Histogram')
plt.show()
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.histplot.html
4. Cat Plots (Box, Violin, Swarm, Boxen)
Catplot is a higher-level versatile function that combines several underlying categorical seaborn plots like boxplots, violinplots, swarmplots, pointplots, barplots and countplots.
Use Cases:
- Explore relationship between categorical and a continuous variables
- Get the statistical summary of a continuous variable
Examples:
# Boxplot
sns.boxplot(data=tips, x='time', y='total_bill', hue='sex', palette='viridis')
plt.title('Boxplot')
plt.show()
# Violinplot
sns.violinplot(data=tips, x='day', y='total_bill', palette='viridis')
plt.title('Violinplot')
plt.show()
#Swarmplot
sns.swarmplot(data=tips, x='time', y='tip', dodge=True, palette='viridis', hue='sex', s=6)
plt.title('SwarmPlot')
plt.show()
#StripPlot
sns.stripplot(data=tips, x='tip', hue='size', y='day', s=25, alpha=0.2,
jitter=False, marker='D',palette='viridis')
plt.title('StripPlot')
plt.show()
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.catplot.html
5. Multiple Plots using FacetGrid
FacetGrid is a feature in the seaborn library that allows you to create multiple subsets of your data in a grid-like arrangement. You can create a grid of plots where each plot represents a category. The subsets are determined by the column names given in the ‘col’ and ‘row’ attribute of FacetGrid(). The individual plots within the grid can be any type of plot supported by seaborn, such as scatter plots, line plots, bar plots, or histograms.
Use Cases:
- Compare and analyse different groups or categories within your dataset
- Create subplots seamlessly
For example, Boxplots for pulse rate during different activities:
# Creating subplots using FacetGrid
g = sns.FacetGrid(exercise, col='kind', palette='Paired')
# Drawing a plot on every facet
g.map(sns.boxplot, 'pulse')
g.set_titles(col_template="Pulse rate for {col_name}")
g.add_legend();
Scatter plots for flipper length and body mass of Penguins from different islands
# Creating subplots using FacetGrid
g = sns.FacetGrid(penguins, col='island',hue='sex', palette='Paired')
# Drawing a plot on every facet
g.map(sns.scatterplot, 'flipper_length_mm', 'body_mass_g')
g.set_titles(template="Penguins of {col_name} Island")
g.add_legend();
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.FacetGrid.html
6. Joint Plots
A joint plot combines multiple univariate and bivariate plots in a single figure. The central plot typically displays a scatter plot or a hexbin plot, representing the joint distribution of the two variables. This main plot is accompanied by additional plots along the axes (histograms or KDEs) that show the distributions of each variable individually.
Use Cases:
- Finding the relationship between 2 variables
- Comparing the individual distributions of 2 different variables
For example, comparison of the displacement and mpg for cars
# Hex Plot with Histogram margins
sns.jointplot(x="mpg", y="displacement", data=mpg,
height=5, kind='hex', ratio=2, marginal_ticks=True);
Comparison of acceleration and horsepower for cars from different countries
# Scatter Plot with KDE Margins
sns.jointplot(x="horsepower", y="acceleration", data=mpg,
hue="origin", height=5, ratio=2, marginal_ticks=True);
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.jointplot.html
7. KDE Plots
A KDE (Kernel Density Estimate) plot is a smoothed version of a histogram representing the probability density function of a continuous random variable. The y-axis represents the density or likelihood of observing a particular value of the variable, and the x-axis represents the values of the variable itself.
Use cases:
- visualization of the distribution of a single variable (univariate analysis)
- Insights into the shape, peaks, and skewness of the distribution.
For example, comparing the horsepower of cars with respect to number of cylinders
#Overlapping KDE Plots
sns.kdeplot(data=mpg, x='horsepower', hue='cylinders', fill=True,
palette='viridis', alpha=.5, linewidth=0)
plt.title('Overlapping KDE Plot')
plt.show();
Comparing the weight of cars across different countries:
#Stacked KDE Plots
sns.kdeplot(data=mpg, x="weight", hue="origin", multiple="stack")
plt.title('Stacked KDE Plot')
plt.show();
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.kdeplot.html
8. Pairplots
A pair plot is a type of visualization that allows you to explore the relationships between multiple variables in a dataset. It is a grid of scatter plots, where each variable is plotted against every other variable. In a pair plot, the diagonal entries are histograms or density plots for each variable, showing the distribution of values.
Use Cases:
Identification of correlations or patterns between variables, such as linear or non-linear relationships, clusters, or outliers.
For example, visualisation of relationship between different features of penguins
#Simple Pairplot
sns.pairplot(data=penguins, corner=True);
# Pairplot with hues
sns.pairplot(data=penguins, hue='species');
It can be observed that introducing hue in the plot helps us identify the crucial differences between different species of penguins.
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.pairplot.html
9. Heatmaps
Heatmaps are a type of visual representation that use color-coded cells to display the values of a matrix or a table of data. In a heatmap, the rows and columns of the matrix represent two different variables, and the color intensity of each cell represents the value or magnitude of the data point at the intersection of those variables.
Use Cases:
Correlation analysis, visualisation of pivot tables which aggragate data by rows and columns
For example, visualisation of the correlation between all the numerical columns of the mpg dataset:
#Selection of numeric columns from the dataset
num_cols = list(mpg.select_dtypes(include='number'))
fig = plt.figure(figsize=(12,7))
#Correlation Heatmap
sns.heatmap(data=mpg[num_cols].corr(),
annot=True, cmap=sns.cubehelix_palette(as_cmap=True))
plt.title('Heatmap of Correlation matrix')
plt.show()
Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.heatmap.html
10. Scatter Plots
A scatterplot displays the relationship between two continuous variables. It is constructed by plotting individual data points on a graph with one variable represented on the x-axis and the other variable represented on the y-axis. The resulting plot consists of multiple points scattered across the graph, hence the name “scatterplot.”
Use Cases:
- Relationship Assessment: Scatterplots help determine the nature of the relationship between two continuous variables. It can reveal if there is a positive correlation (both variables increase or decrease together), negative correlation (one variable increases while the other decreases), or no correlation (no apparent relationship).
For example, it can be observed from the scatterplot below that the horsepower and weight of cars is positively correlated
# Simple Scatterplot
sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7)
plt.title('Simple Scatterplot')
plt.show();
2. Outlier Identification: Scatterplots can highlight outliers, which are data points that deviate significantly from the overall pattern.
3. Clustering and Grouping: By visually examining the distribution of points, you can identify if there are natural groupings or patterns among the variables.
For example, comparing the horsepower and weight of cars from different countries
# Scatterplot with Hue
sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
hue='origin', palette='viridis')
plt.title('Scatterplot with Hue')
plt.show()
# Scatterplot with Hue and Markers
sns.scatterplot(data=mpg, x='weight', y='horsepower', s=150, alpha=0.7,
style='origin',palette='viridis', hue='origin')
plt.title('Scatterplot with Hue and Markers')
plt.show()
# Scatterplot with Hue & Size
sns.scatterplot(data=mpg, x='weight', y='horsepower', sizes=(40, 400), alpha=.5,
palette='viridis', hue='origin', size='cylinders')
plt.title('Scatterplot with Hue & Size')
plt.show()
4. Trend Analysis: By plotting data points chronologically, scatterplots can depict the evolution or progression of variables, helping to identify trends or changes in behaviour.
5. Model Validation: By comparing the predicted values of a model to the actual values, scatterplots can visualize the accuracy or deviation of the model’s predictions.
To know more about scatter plots, refer Seaborn Documentation: https://seaborn.pydata.org/generated/seaborn.scatterplot.html
Conclusion
- This article summarises most of the commonly used visualisations in the Seaborn Library of Python.
- Refer Seaborn Gallery for more unique plots: https://seaborn.pydata.org/examples/index.html
If you found this helpful, do add some claps and follow me for more on analytics, statistics and data science!