Data Visualization using Matplotlib and Seaborn

Published in

Analytics Vidhya

5 min readOct 9, 2020

What is Data Visualization?

Today’s world a lot of data is going everywhere. The data is getting increasing every day. We can see in real-time right from the mobile. Using social media, Mails, Bank transactions keep increasing day by day. Is it possible to view the massive data in the formal way of representation? Yes, we can do via data visualization. The data visualization is a graphical representation of data. In the big data world, there are several data visualization tools capable of analysing the massive data used for decision making.

Today, we are going to implement data visualization using a dataset from UCI.

Let’s start…

Download the dataset from the below link.

Bank Marketing

Load the data in dataFrame

df = pd.read_csv(r'\Dataset\bank-full.csv', sep=';')
df

Bank marketing dataset

Matplotlib

Histogram

A histogram is a chart that groups numeric data into bins, displaying the bins as segmented columns. They’re used to depict the distribution of a dataset: how often values fall into ranges.

Using the histogram, we can see the people’s around ~39000 hold balance of 0. Where 5000 peoples hold the balance of ~15000.

df['balance'].plot(kind='hist')

HIstogram across the pricing parameters

PieChart

To determine which job getting highly paid.

Need to create the dataset basically grouping on the job and balance as below

df_group = df.groupby(['job'])['balance'].sum()

Plotting the data using a pie chart

df_group.plot.pie(figsize=(10,20), autopct="%.2f")

The management person is highly paid and secondly blu-collar job.

Another interesting one, Determine the number of people who take a loan

df['loan'].value_counts(normalize=True).plot(kind='pie', autopct="%.1f")

The person around 16% taken a bank loan.

Count

plt.figure(figsize=(20,10))
plt.xticks(rotation=90)
df.job.value_counts().plot(kind='barh')

Who is taking most of the job or job in demand? Clearly, blue-collar job taking most of it.

BarChart

Determine under which eduction pays the better balance.

df_loan = df.groupby(['education'])['balance'].sum().reset_index()df_loan.plot.bar(x='education', y='balance')

From the plot, the secondary & Tertiary plays the better balance

Seaborn

Catplot

This function provides access to several axes-level functions that show the relationship between a numerical and one or more categorical variables using one of several visual representations.

From the dataset, firstly group the person having thee balance which is greater than 100

greater_100_balance = df[df['balance'] > 100]
greater_100_balance

Having balance > 100

Distribute the balance across the months for each marital status as below.

sns.catplot(x='month', y='balance', col='marital', data=greater_100_balance, kind='bar')

Pairplot

Plot pairwise relationships in a dataset.

By default, this function will create a grid of Axes such that each numeric variable in data will by shared across the y-axes across a single row and the x-axes across a single column. The diagonal plots are treated differently: a univariate distribution plot is drawn to show the marginal distribution of the data in each column.

It is also possible to show a subset of variables or plot different variables on the rows and columns.

sns.pairplot(df)

Countplot

Show the counts of observations in each categorical bin using bars.

sns.countplot(x='housing', data=df)

Determine the number of persons having owned houses.

Scatterplot

The relationship between x and y can be shown for different subsets of the data using the hue, size, and style parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective. Using redundant semantics (i.e. both hue and style for the same variable) can be helpful for making graphics more accessible.

sns.scatterplot(x='age', y='balance', data=df)

The person who aged 50 having a higher balance. The age between 50–60 can see the peak in balance.

Relplot

The relationship between x and y can be shown for different subsets of the data using the hue, size, and style parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective. Using redundant semantics (i.e. both hue and style for the same variable) can be helpful for making graphics more accessible.

sns.relplot(x='day', y='balance', data=df, hue='month')

On 3rd of June deposited higher balance comparative to November month.

Jointplot

Draw a plot of two variables with bivariate and univariate graphs.

sns.jointplot(x=df['day'], y=df['balance'])

The above illsutration implemented using joinyplot.

References:

Seaborn

Matplotlib

Data Visualization using Matplotlib and Seaborn

Written by Antony Christopher