Python’s Matplotlib: How to Create Figures and Plots

Ploy Mongkoldao
Analytics Vidhya
Published in
7 min readMay 2, 2020

Businesses are flooded with data generated at an unprecedented rate and amount with variety of forms. Dynamism of business landscape invites large and small firms to consider about utilizing data collected internally and externally in the hope of staying competitive from understanding customer behaviors. ‘Data Understanding’ is a part of CRISP-DM and Data Science process that allows business to gain a primary understanding of their data before they can solve business problems accordingly.

https://pixabay.com/photos/books-library-education-literature-768426/

Data Analysis Process

‘CRISP-DM’, which stands for ‘Cross-Industry Standard Process for Data Mining’ specifies 6 major stages of data mining for data analysis;

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment

According to the Data Science Workshop offered by Packt, running business projects using big data often involves multiple steps in application of data science.

  • Defining Business Problem
  • Collecting Existing Data
  • Analyzing / Visualizing / Preparing Data
  • Training Model
  • Assessing Model’s Performance
  • Communicating Findings & Gained Insights
  • Deploying Model

We can see common steps between these 2 references that ‘Data Understanding’ is an important approach for solving business problems. To quickly gain basic knowledge about data at hands, visualizing data plays as important role.

Python for Data Visualization

As mentioned before that in this era, data is generated in variety of structure at the unprecedented speed. Therefore, Data Skill is essential. And now we will employ Python’s Matplotlib to create and customize plotting of the data.

This part will focus on creating figure for plotting and how to generate multiple plots.

Matplotlib Module

  • As of today, the latest version of Matplotlib is 3.2.1, released on March 19, 2020.
  • Before beginning working with Matplotlib, we need to import the module. However, in Matplotlib, Pyplot submodule allows plotting of various types of graphs for different purposes. Pyplot can be imported with alias as;

from matplotlib import pyplot as plt

Types of Graph Available

Matplotlib provides varieties of graph you will need for data understanding. In general, there are 4 major categories to consider;

  • Comparison: for comparing multiple variables over time such as Line, Bar (vertical / horizontal), Radar charts
  • Relation: for showing relationship among variables such as scatter, bubble plots or heatmap
  • Composition: for visualizing variables that are parts of a whole such as pie, stacked bar/area charts
  • Distribution: for showing distribution of data such as histogram, density plot, boxplot, violin plot

Approaches Towards Matplotlib Plotting

Basically we might do plotting in 2 main ways, Object-Oriented Style & Pyplot Style, which we will later see different styles applied to the actual data.

Object-Oriented Style (OO Style)
Explicitly create objects, which are Figure & Axes, necessary to draw the plot.

Pyplot Style
Let Pyplot handles the plot automatically.

Creating a Single Plot

Depending on type of plot we would like to create, we can start simple with:

# plotting a line graph of x & y coordinates
plt.plot(x,y)

# displaying the plot
plt.show()

With above syntax, Matplotlib automatically generates a single figure contained a single line plot for you.

Alternatively, we might want to explicitly create figure object as a container of a plot inside. This allows you to specify run command parameters (rcParams) for setting a figure.

# creating figure object containing a single plot
fig = plt.figure()

# displaying the plot
plt.show()

Creating a Figure Object as a Container of Plots

Keyword arguments for plt.figure() can be specify within the parentheses, although if we specify nothing, the default values are executed;

num : setting figure number
figsize : set figure size in inches
dpi : set figure resolution
facecolor : background color
etc.

Let’s plotting a figure of line graph with the dataset from www.data.gov.sg on Air Passenger Arrivals: Total by Region and Selected Country of Embarkation which is a time-series data. This is perfect with line chart as it is a comparison over time the number of air passenger arrivals to Singapore from 1960–2019.

Overview of Data

Suppose that I would like to compare number of air passengers from China and Thailand over the same period of time. After reading data from the csv file and a little manipulation, the 2 lines appeared within a figure for comparing how air passengers from the two countries traveling into Singapore between 1960 and 2019.

Manipulating Data

China Air Passenger Data

With Python’s Pandas, data is resample based on the ‘Month’ column that has already been set as index in order to sum up number of air passengers for each year.

Thailand Air Passenger Data

Plotting Data

From above, we have already pulled data for Thailand and China from the whole dataframe and are ready to plot. With Pyplot, plt.plot([x], y) will draw a line graph, where [x] is optional. In this case, our y values are from thailand_yearly and china_yearly data, hence, we input them as arguments to plt.plot(thailand_yearly) for Thailand and plt.plot(china_yearly) for China as depicted below;

source: Data.gov.sg

Creating Multiple Plots

There are options for multiple graphs plotting and it is depending on you whether to choose which one that is more suitable for your conditions.

1. Adding a single plot one by one

Just like we did in creating a single plot, we start with creating new figure object:

fig = plt.figure()

After creating a new figure, then add subplots one by one to the figure created with add_subplot method:

# a figure contains 2 rows & 1 column and ax1 is the 1st subplot
ax1 = fig.add_subplot(2,1,1)

# a figure contains 2 rows & 1 column and ax2 is the 2nd subplot
ax2 = fig.add_subplot(2,1,2)

# displaying the plot
plt.show()

This will return 2 graphs drawn vertically aligned to each other.

Let’s create multiple line graphs from our data. This time I would like to see total number of passengers flown to Singapore by region. The data compiles passengers from 3 different regions, which are Southeast Asia, Northeast Asia and Europe.

Finding Unique Values of Region Column

With some manipulation, I can aggregate total number of passengers for each region by the same method previously done with single plot. However, we can complete the task without having to re-generate new dataframe for each region like before. I will loop through the original data for passenger number of each region. Then aggregates numbers on yearly basis with resample method. Data will be generated for each region each time we loop which allows us to plot line graph of each region separately.

Adding a Single Plot One by One

From the code above, using for-loop, I created three line graphs plotted separately, which were added to the Figure object one by one with add_subplot() approach.

2. Specifying number of plots and layout

This time 2 objects, Figure object and Axes object, are created at the same time using subplots method of Pyplot submodule:

# a figure contains 2 rows & 1 column
# ax1 is the first subplot & ax2 is the second subplot

fig, (ax1, ax2) = plt.subplots(nrows=2,ncols=1)

# displaying the plot
plt.show()

OO Style Creating Figure & Axes Objects with plt.subplots()

This will return 2 graphs drawn vertically aligned to each other just like we did via previous codes.

Let’s revisit our previous code where we generated multiple plot. The only different is that Axes Object is created explicitly with plt.subplots() as seen in first line of code in the figure below. By for-loop, line graph is drawn for each region and displayed side by side by our configuration.

fig, ax = plt.subplots(nrows=1, ncols=3)

Inside the for-loop, as we explicitly set Axes Object, you might spot the different of the plotting code. We need to specify which Axes to plot the particular region data as shown in the code as ax[i].plot(……).

Creating Multiple Graphs with plt.subplots()

As you can see, the layout for plotting is the same for the previous one which is 1 row by 3 columns. Although you can set different layout as mine, like nrows=3 and ncols=1, for example. This configuration will set the plot vertically.

Vertical Plotting with subplots()

Plus++

Apart from approach to creating graphs mentioned in this post, there are other approaches that you can employ for your projects such as you might choose to use subplot2grid method of Pyplot to create an axis at specific location inside a regular grid. This could add more control and flexibility to create your plot.

Thanks all for reading all the way toward the end!

Happy Plotting!!

--

--