Deep Dive in Machine Learning with Python
Part — X: Data Visualization using Pandas & Matplotlib
Welcome to another blog of Deep Dive in Machine Learning with Python, in the last blog we worked with Advanced Pandas functions using Heart Disease dataset. In today’s blog, we will focus on visualizing the data using Pandas and data visualization libraries(Matplotlib).
For this blog, we will use the popular Gapminder dataset and create the various Interactive and Non-Interactive graphs.
Import the necessary python libraries
Import the dataset
We will import the dataset from a CSV file(i.e.gapminder.csv) and create a Pandas DataFrame.
Problem-1: How to plot the Bar graph displaying the Total Population of some of the countries?
CASE-1: Assigning colors manually
So, we created a new DataFrame COUNTRY_POP which contains the Total POPULATION of each country.
In the above step, we created a new column COLORS in the COUNTRY_POP DataFrame in which colors are mapped to some of the countries.
CASE-2: Using color map
In this step, we created a Numpy array with distinct 182 values same as the number of countries in the dataset.
In this step, we created the Colormap object(i.e. colors) using the CM function and provided the type ‘Viridis’.
CASE-2.1: Horizontal Bar Plot
CASE-2.2: Vertical Bar Plot
In the above example, we generated the Vertical Plot and provided the parameter value ‘bar’ instead of ‘barh’.
Other parameters:
plt.minorticks_on: This parameter will enable the ticks on the x & y axes
plt.grid: This parameter will draw the square grid lines of the graph with the mentioned color and line style
plt.xlabel: This provides the label to the x-axis
plt.ylabel: This provides the label to the y-axis
Problem-2: How to plot the Scatter Plot displaying the Total Number of Babies against Total GDP for the regions?
Here, in the above example, we created the Scatter Plot displaying the Total Number of babies against the Total GDP for the continents.
Problem-3: How to plot the interactive Scatter Plot which will display the Total number of babies against GDP Per capital from 1950 to 2015?
In the step-1, we created a new DataFrame babies_in_region having Total number of babies and GDP Total for every year and region.
Problem-4: How to display the Growth in population via Line Graph?
In this step, we created the population DataFrame having a year-wise total population.
Problem-5: How to view the outliers in the dataset by using BOX Plots?
In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points.
Problem-6: How to plot the Population sharing among the continents using Pie Charts?
Problem-7: How to plot the Babies_per_woman sharing among the continents using Pie Charts?
In the above two examples, we created the Pie-charts representing the population and babies_per_woman sharing percentage among the regions.
Congratulations, we come to the end of this blog. To summarize, we created various charts using Pandas and Matplotlib. In the next blog, we will cover plots like Histogram, Pair plot, Density plots, and others.
If you want to download the Jupyter Notebook of this blog, then kindly access below GitHub repository:
Thank you and happy learning!!!
Blog-11: Data Visualization — II