Seaborn: statistical data visualization

Seaborn is a Python data visualization library based on Matplotlib that provides a high-level interface for drawing attractive statistical graphics. It is particularly useful for exploring and visualizing relationships in datasets.

Deepak Chaudhary
5 min readFeb 7, 2024

Seaborn is a Python data visualization library built on top of Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. Seaborn is specifically designed to work well with pandas DataFrame objects, making it convenient for data analysis tasks. Here’s a detailed explanation of Seaborn’s features and capabilities:

  1. Statistical Visualization: Seaborn is primarily used for creating statistical visualizations that help explore and understand datasets. It offers a variety of plots for visualizing univariate, bivariate, and multivariate distributions, as well as relationships between variables.
  2. Integration with Pandas: Seaborn seamlessly integrates with pandas DataFrames, allowing users to directly pass DataFrame objects to its plotting functions. This makes it easy to work with data stored in tabular format and perform visual analysis.
  3. Attractive Aesthetics: Seaborn comes with built-in themes and color palettes that enhance the aesthetics of plots. It provides a visually appealing default style, but users can also customize the appearance of plots to suit their preferences or match the requirements of their analysis or presentation.
  4. High-Level Abstractions: Seaborn offers high-level abstractions for creating complex plots with minimal code. For example, functions like lmplot, pairplot, and jointplot automatically handle the creation of multi-panel plots, allowing users to focus on interpreting the results rather than worrying about the technical details of plot creation.
  5. Flexibility: While Seaborn provides high-level abstractions for common plot types, it also offers fine-grained control over plot elements through its extensive set of parameters. Users can customize aspects such as colors, styles, annotations, and axis limits to create plots that convey their intended message effectively.
  6. Statistical Estimation: Many Seaborn plots include built-in statistical estimation and aggregation functions that provide insights into the underlying data distribution or relationships. For example, scatterplots can include linear regression lines, and categorical plots can display confidence intervals around point estimates.
  7. Ease of Use: Seaborn is designed to be user-friendly and intuitive, with a consistent and well-documented API. It simplifies the process of creating complex visualizations by handling tasks such as data aggregation, binning, and normalization internally, allowing users to focus on the analysis rather than the mechanics of plotting.
  8. Integration with Matplotlib: Seaborn is built on top of Matplotlib and seamlessly integrates with it. Users can leverage the power and flexibility of Matplotlib for low-level customization while benefiting from Seaborn’s higher-level plotting functions and aesthetics.
  9. Extensibility: Seaborn is extensible, allowing users to create custom plot types or modify existing ones to suit their specific needs. Advanced users can also combine Seaborn plots with Matplotlib or other visualization libraries to create highly customized visualizations.

Graphs:

1. Barplot:

  • A barplot is used to show the relationship between a categorical variable and a numerical variable.
  • Seaborn’s barplot function is used to create barplots.
  • It displays the mean (or other estimator) of the numerical variable for each category of the categorical variable along with confidence intervals.
  • Example:
import seaborn as sns
import numpy as np
import pandas as pd
tips=sns.load_dataset('tips')
print(tips)
sns.barplot(x='sex', y='total_bill',data=tips, estimator=np.std)
Barplot

2. Countplot:

  • A countplot is used to show the count of observations in each category of a categorical variable.
  • Seaborn’s countplot function is used to create countplots.
  • It displays the frequency of each category.
  • Example:
sns.countplot(x='sex' , data=tips)
plt.show()
Countplot

3. Boxplot:

  • A boxplot is used to visualize the distribution of a numerical variable and detect outliers.
  • Seaborn’s boxplot function is used to create boxplots.
  • It displays the median, quartiles, and potential outliers in the data.
  • Example:
sns.boxplot(x='day', y='total_bill' ,data=tips, hue='sex')
plt.show()
Boxplot

4. Violinplot:

  • A violinplot is similar to a boxplot but provides a deeper insight into the distribution of the numerical variable.
  • It combines a boxplot with a kernel density estimation (KDE) plot.
  • Seaborn’s violinplot function is used to create violinplots.
  • Example:
sns.violinplot(x='day', y='total_bill' ,data=tips, hue='sex', split=True)
plt.show()
Violinplot

5. Distplot:

  • A distplot is used to visualize the distribution of a single numerical variable.
  • It combines a histogram with a KDE plot.
  • Seaborn’s distplot function is used to create distplots.
  • Example:
sns.distplot(tips['total_bill'], kde=True, bins=20)
plt.show()
Distplot

6. Kdeplot:

  • A kdeplot is used to visualize the Kernel Density Estimate of a numerical variable.
  • It represents the probability density function of the variable.
  • Seaborn’s kdeplot function is used to create kdeplots.
  • Example:
sns.kdeplot(tips['total_bill'], shade=True)
plt.show()
Kdeplot

7. Jointplot:

  • A jointplot is used to visualize the relationship between two numerical variables along with their individual distributions.
  • It combines scatterplots with histograms or KDE plots.
  • Seaborn’s jointplot function is used to create jointplots.
  • Example:
sns.jointplot(x='total_bill', y='tip', data=tips, kind='scatter')
sns.jointplot(x='total_bill', y='tip', data=tips, kind='kde')
sns.jointplot(x='total_bill', y='tip', data=tips, kind='reg')
sns.jointplot(x='total_bill', y='tip', data=tips, kind='hex')
plt.show()
Jointplot(kind=scatter)
Jointplot(kind=kde)
Jointplot(kind=reg)
Jointplot(kind=hex)

8. Pairplot:

  • A pairplot is used to visualize pairwise relationships between multiple numerical variables in a dataset.
  • It creates scatterplots for each pair of variables and histograms along the diagonal.
  • Seaborn’s pairplot function is used to create pairplots.
  • Example:
sns.pairplot(tips, hue='sex')
plt.show()
Pairplot

9. Heatmap:

  • A heatmap is used to visualize the correlation between variables in a dataset.
  • It represents the correlation coefficients as a colour-encoded matrix.
  • Seaborn’s heatmap function is used to create heatmaps.
  • Example:
data=tips[['total_bill', 'tip', 'size']]
tips_corr=data.corr()
sns.heatmap(tips_corr,annot=True, cmap='Greens')
plt.show()
Heatmap

Overall, Seaborn is a flexible and strong tool for producing visually stunning and educational statistical visuals in Python. Seaborn gives you the flexibility and resources you need to make powerful visualisations, whether you’re presenting your findings, sharing ideas, or investigating data.

--

--

Deepak Chaudhary

Computer Science And Engineering(AI and ML) Student @GNIOT College Greater Noida India | Exploring Jobs in Tech | Queries 👉 deepakchaudhary8303@gmail.com