PAIRPLOT VISUALIZATION

PAIRPLOT VISUALIZATION

Sarath SL
Analytics Vidhya
Published in
3 min readSep 29, 2019

--

Pairplot visualization comes handy when you want to go for Exploratory data analysis (“EDA”).

Pairplot visualizes given data to find the relationship between them where the variables can be continuous or categorical.
Plot pairwise relationships in a data-set.

Pairplot is a module of seaborn library which provides a high-level interface for drawing attractive and informative statistical graphics.

Let’s see how the same has been done.

  1. Get your data

As observed from the data , you can see there are both continuous and categorical columns in data-set.

Let’s plot data using pairplot:
From the picture below, we can observe the variations in each plot. The plots are in matrix format where the row name represents x axis and column name represents the y axis. The main-diagonal subplots are the univariate histograms (distributions) for each attribute.

Pairplot Parameters:
seaborn.pairplot
(data, hue=None, hue_order=None, palette=None, vars=None, x_vars=None, y_vars=None, kind='scatter', diag_kind='auto', markers=None, height=2.5, aspect=1, dropna=True, plot_kws=None, diag_kws=None, grid_kws=None, size=None)

Except data, all other parameters are optional. There are few other parameters which pairplot can accept.

Getting understanding on some commonly used parameters — →

different levels of a categorical variable by the color of plot elements:

Hue helps you to get the difference in variable in data to map plot aspects to different colors.
sns.pairplot(df, hue=”smoker”)

Use a different color palette.
Its basically a set of colors for mapping the hue variable.
g = sns.pairplot(df, hue=”smoker”, palette=”husl”)

Use different markers for each level of the hue variable:
sns.pairplot(df, hue=”smoker”, markers=[“o”, “s”])

We can also plot subplots using pairplot using “vars”.

Commonly used:
sns.pairplot(df,hue = ‘smoker’,diag_kind = “kde”,kind = “scatter”,palette = “husl”)

where :
kind : Kind of plot for the non-identity relationships. {‘scatter’, ‘reg’}
diag_kind : Kind of plot for the diagonal subplots. {‘hist’, ‘kde’}

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.

There are other different plots by which you can do EDA technique which helps you to get better understanding on the relationship among different variables of the data.

All of the above techniques will help you to take best suited variables to build your Machine Learning Model.

Thank you!

--

--