Data Visualization using Python Part-II

Published in

Analytics Vidhya

5 min readJan 1, 2021

Here’s the Part-II to Data Visualization using Python Part-I. If you haven’t gone through it yet, do it right now! We shall now implement the same using Seaborn Library in Python!

Image Source: Airtame — Data Visualization

Matplotlib has proven to be an incredibly useful and popular visualization tool, but even avid users will admit it often leaves much to be desired. There are several valid complaints about Matplotlib that often come up.

An answer to these problems is Seaborn. Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and colour defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrame.

Getting Started with Seaborn

Seaborn Installation

pip install seaborn

Refer seaborn· PyPI for troubleshooting

Importing Seaborn

import seaborn as sns

Seaborn vs Matplotlib

Here is an example of a simple random-walk plot in Matplotlib, using its classic plot formatting and colours. We start with the typical imports —

import matplotlib.pyplot as plt
plt.style.use('classic')
%matplotlib inline
import numpy as np
import pandas as pd

Now we create some random walk data —

#Create some data
rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 6), 0)

And do a simple plot —

#Plot the data with Matplotlib defaults
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

Although the result contains all the information we’d like it to convey, it does so in a way that is not all that aesthetically pleasing, and even looks a bit old-fashioned.

Using Seaborn

import seaborn as sns
sns.set()

Now let’s rerun the same two lines as before —

#same plotting code as above!
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

This is much better than the previous one!

Histograms, KDE, and densities

Often in statistical data visualization, all you want is to plot histograms and joint distributions of variables. We have seen that this is relatively straightforward in Matplotlib —

data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])

for col in 'xy':
    plt.hist(data[col], normed=True, alpha=0.5)

Rather than a histogram, we can get a smooth estimate of the distribution using a kernel density estimation, which Seaborn does with sns.kdeplot —

for col in 'xy':
    sns.kdeplot(data[col], shade=True)

Histograms and KDE can be combined using distplot —

sns.distplot(data['x'])
sns.distplot(data['y']);

If we pass the full two-dimensional dataset to kdeplot, we will get a two-dimensional visualization of the data —

sns.kdeplot(data);

We can see the joint distribution and the marginal distributions together using sns.jointplot. For this plot, we'll set the style to a white background —

with sns.axes_style('white'):
    sns.jointplot("x", "y", data, kind='kde');

There are other parameters that can be passed to jointplot—for example, we can use a hexagonally based histogram instead —

with sns.axes_style('white'):
    sns.jointplot("x", "y", data, kind='hex')

Pair Plots

When you generalize joint plots to datasets of larger dimensions, you end up with pair plots.

We’ll demo this with the well-known Iris dataset, which lists measurements of petals and sepals of three iris species —

iris = sns.load_dataset("iris")
sns.pairplot(iris, hue='species', size=2.5);

Visualizing the multidimensional relationships among the samples is as easy as calling sns.pairplot —

Factor Plots

Factor plots can be useful for this kind of visualization as well. This allows you to view the distribution of a parameter within bins defined by any other parameter —

with sns.axes_style(style='ticks'):
    g = sns.factorplot("day", "total_bill", "sex", data=tips, kind="box")
    g.set_axis_labels("Day", "Total Bill");

Joint Distributions

Similar to the pair plot we saw earlier, we can use sns.jointplot to show the joint distribution between different datasets, along with the associated marginal distributions —

with sns.axes_style('white'):
    sns.jointplot("total_bill", "tip", data=tips, kind='hex')

The joint plot can even do some automatic kernel density estimation and regression —

sns.jointplot("total_bill", "tip", data=tips, kind='reg');

Bar Plots

We have already seen Bar Plots with Matplotlib, now let us try doing the same with Seaborn.

Time series can be plotted using sns.factorplot. In the following example, we'll use the Planets dataset.

planets = sns.load_dataset('planets')
with sns.axes_style('white'):
    g = sns.factorplot("year", data=planets, aspect=2,
                       kind="count", color='steelblue')
    g.set_xticklabels(step=5)

We can learn more by looking at the method of discovery of each of these planets —

with sns.axes_style('white'):
    g = sns.factorplot("year", data=planets, aspect=4.0, kind='count',
                       hue='method', order=range(2001, 2015))
    g.set_ylabels('Number of Planets Discovered')

For more information on plotting with Seaborn, see the Seaborn documentation, a tutorial, and the Seaborn gallery.
References — https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html
https://elitedatascience.com/python-seaborn-tutorial

For complete code, visit the following link —

tanvipenumudy/Winter-Internship-Internity

Repository to keep track of work assigned on a daily basis - tanvipenumudy/Winter-Internship-Internity

github.com