Deep Dive in Machine Learning with Python

Part — X: Data Visualization using Pandas & Matplotlib

Rajesh Sharma
Analytics Vidhya
5 min readJan 5, 2020

--

Welcome to another blog of Deep Dive in Machine Learning with Python, in the last blog we worked with Advanced Pandas functions using Heart Disease dataset. In today’s blog, we will focus on visualizing the data using Pandas and data visualization libraries(Matplotlib).

For this blog, we will use the popular Gapminder dataset and create the various Interactive and Non-Interactive graphs.

Import the necessary python libraries

Required libraries

Import the dataset

We will import the dataset from a CSV file(i.e.gapminder.csv) and create a Pandas DataFrame.

Data read from CSV file

Problem-1: How to plot the Bar graph displaying the Total Population of some of the countries?

CASE-1: Assigning colors manually

DataFrame-COUNTRY_POP

So, we created a new DataFrame COUNTRY_POP which contains the Total POPULATION of each country.

Manually assigning the colors against Countries

In the above step, we created a new column COLORS in the COUNTRY_POP DataFrame in which colors are mapped to some of the countries.

COUNTRY_POP with newly added column
Horizontal Bar Graph

CASE-2: Using color map

Array with random values required for assigning a color

In this step, we created a Numpy array with distinct 182 values same as the number of countries in the dataset.

Colormap object

In this step, we created the Colormap object(i.e. colors) using the CM function and provided the type ‘Viridis’.

CASE-2.1: Horizontal Bar Plot

Horizontal Bar Graph

CASE-2.2: Vertical Bar Plot

Vertical Bar Graph

In the above example, we generated the Vertical Plot and provided the parameter value ‘bar’ instead of ‘barh’.

Other parameters:

plt.minorticks_on: This parameter will enable the ticks on the x & y axes

plt.grid: This parameter will draw the square grid lines of the graph with the mentioned color and line style

plt.xlabel: This provides the label to the x-axis

plt.ylabel: This provides the label to the y-axis

Problem-2: How to plot the Scatter Plot displaying the Total Number of Babies against Total GDP for the regions?

Region-wise Total number of Babies and Total GDP
Scatter Plot

Here, in the above example, we created the Scatter Plot displaying the Total Number of babies against the Total GDP for the continents.

Problem-3: How to plot the interactive Scatter Plot which will display the Total number of babies against GDP Per capital from 1950 to 2015?

Year-wise Total number of babies and Total GDP of regions

In the step-1, we created a new DataFrame babies_in_region having Total number of babies and GDP Total for every year and region.

Babies_in_region: Total number of babies and Total GDP of regions
Function for creating the Interactive Plot
Scatter Plot with Year slider(video uploaded on GitHub)

Problem-4: How to display the Growth in population via Line Graph?

Solution-4.1

In this step, we created the population DataFrame having a year-wise total population.

Line Graph

Problem-5: How to view the outliers in the dataset by using BOX Plots?

Box-plot

In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points.

Problem-6: How to plot the Population sharing among the continents using Pie Charts?

Solution-6.1
Pie-chart

Problem-7: How to plot the Babies_per_woman sharing among the continents using Pie Charts?

Solution-7.1
Pie-chart

In the above two examples, we created the Pie-charts representing the population and babies_per_woman sharing percentage among the regions.

Congratulations, we come to the end of this blog. To summarize, we created various charts using Pandas and Matplotlib. In the next blog, we will cover plots like Histogram, Pair plot, Density plots, and others.

If you want to download the Jupyter Notebook of this blog, then kindly access below GitHub repository:

https://github.com/Rajesh-ML-Engg/Deep_Dive_in_ML_Python

Thank you and happy learning!!!

Blog-11: Data Visualization — II

--

--

Rajesh Sharma
Analytics Vidhya

It can be messy, it can be unstructured but it always speaks, we only need to understand its language!!