Data visualization using matplotlib for beginners.

Chinmai Rane
GDSC UMIT
Published in
5 min readDec 28, 2021

Introduction

Matplotlib was the first library I used in Python for data visualisation. It’s simple to use, yet the versatility and agility it offers are unrivalled. To show our outcomes, we have a variety of visualisations to select from. Matplotlib provides a variety of colours, themes, palettes, and other choices to create and personalise our plots, from histograms to scatterplots.

By the end of this tutorial, we’ll have learned how to use Matplotlib to visualise data in a variety of ways.

Following are the visualizations we’ll design using Matplotlib-

  1. Line graph
  2. Bar graph
  3. Histogram
  4. Scatter plot

In this article we are going to the dataset from the National Institute of Diabetes and Digestive and Kidney Diseases. The datasets consist of several medical predictor variables and one target variable, Outcome.

In this tutorial, I used Jupyter; however, you can use whatever is most convenient or available at the time.

Let’s import the relevant libraries and checkout the dataset:

To import libraries and checking out dataset type and run the following(replace NAME_OF_YOUR_CSV_FILE with the name of your csv or path of the csv).

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltdf = pd.read_csv(‘NAME_OF_YOUR_CSV_FILE’)df.head()

From this we see that the predictor variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on. This dataset diagnostically predicts whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

Line graphs-

Line graphs are the most basic graphs you can make using Matplotlib. Let’s create a graph to see the relationship between Skin thickness and diabetes pedigree function to see how they relate to each other from the dataset utilizing only a few lines of code.

Type the following lines(change the variable names according to your choice),

skin = df['SkinThickness']diabetes = df['DiabetesPedigreeFunction']plt.plot(skin, diabetes)plt.title('Skin thickness vs Diabetes pedigree function')plt.xlabel('Diabetes pedigree function')plt.ylabel('Skin thickness')plt.show()

We gave the column names we wished to compare to a simpler named variable in the above code block, making it easier to call. After that by using the plot method of the Matplotlib object we pass the variable names. The title method makes the provided string the plot’s primary title. The xlabel and ylabel techniques, respectively, label the x- and y-axes. The plot is displayed using the show technique. What if we wish to look at the relationships of many variables on the same graph? To do so, just use plt.plot() twice with the two separate series you want to pass as the x-value arguments, as illustrated here:

skin = df['SkinThickness']diabetes = df['DiabetesPedigreeFunction']glucose = df['Glucose']plt.plot(skin, diabetes)plt.plot(skin, glucose)plt.show()

Bar graph-

Constructing bar graphs in Matplotlib is a bit more difficult than you would think. It may be accomplished with a few lines of code, but it is essential to comprehend what this code does.

The following code block is used to generate a bar graph(change the variable names according to your choice):

age = df['Age']diabetes = df['DiabetesPedigreeFunction']plt.bar(age, diabetes, color = "blue")plt.xlabel("Diabetes Pedigree Function")plt.ylabel("Age")plt.title("Diabetes Pedigree function variation due to age")plt.show()

The latter four lines of code are rather self-explanatory, but what exactly is going on in the first three? In the first two lines we gave the column names we wished to compare to a simpler named variable in the above code block, making it easier to call. In the third line we used the bar method of Matplotlib object to generate a bar graph. When run, this code produces the following bar graph:

Histogram-

A histogram depicts the distribution of a specific data characteristic. Simply put, it tells us how many observations have a certain value. Just like line graphs, histograms are very easy to create.

To graph a histogram type the following(change the variable names according to your choice):

bmi = df['BMI']plt.hist(bmi)plt.title("BMI frequency")plt.xlabel("BMI")plt.ylabel("Frequency")plt.show()

In the code block mentioned above we used a new method called ‘hist’ to create a histogram. Other lines of code are pretty similar to the ones we used before.

This histogram was created in five simple lines of code. It tells us how many people have that particular BMI. The BMI doesn’t have a continuous range of values so we can get a general idea just by looking at it.

Scatter plots-

Scatterplots are an excellent method to display a relationship between two variables without the risk of a wacky trend line that a line graph may produce. Scatter Plots are useful for discovering linear correlations in data. A scatter plot in Matplotlib is as easy to make as a line graph, and just takes a few lines of code, as seen below.

Run the following lines of code(change the variable names according to your choice),

age = df['Age']bmi = df['BMI']plt.scatter(age, bmi)plt.xlabel('Age')plt.ylabel('BMI')plt.show()

To create a scatter plot, we utilised the scatter method in the previously described code block. Other methods are identical to those we used previously.

With a few exceptions, graph axes should always begin at 0 by convention. As we can see, the lowest x-tick in this graph does not exactly start at zero, which is deceptive. Fortunately, this is an easy repair. Just before using plt.show, add the line plt.xlim(0,’end point’) *[‘end point’ is meant to be substituted with a real value] (). plt.ylim may be used to perform the same on the y-axis.

Conclusion-

As you can see, Matplotlib is an excellent method to rapidly build basic visualisations. Most graphs are created with only a few lines of code and may be tastefully improved to make them even better. For more information on Matplotlib methods, click here.

I hope you found this article to be informative and easy to grasp.

Thank you for taking the time to read this!

--

--