Data Visualization for absolute beginners [part 1/3]

Ishank Sharma
Analytics Vidhya
Published in
5 min readJun 15, 2020

The job of a Data Scientist has been ranked the number one job by glassdoor and it is an excellent career path providing not just great salary potential but an opportunity to work on some of the world’s most interesting problems!

As a matter of fact, data visualization is one of the most in-demand skill for any data scientist out there.

Data visualization in simple terms means the representation of data in a visual format like the graph, chart, etc. We need data visualization because a visual output is always easy to understand and proves to be a much better alternative over text-based data.

Motivation

Being an active user of matplotlib (a wonderful library to plot data) , I often find it difficult to follow along with the documentation, and I always had to struggle and rely on StackOverflow to do even the basic stuff.
Hence I decided to write down this basic tutorial to help newbies get started with this library and become better at it.

So let’s get started!

In this article, we will try to get a basic understanding of python library called Matplotlib. We will start off with some basic examples and learn the most fundamental commands of Matplotlib then we will gradually move to concepts of Object-Oriented Method followed by learning about labeling our plots.

For simplicity purposes, we will divide this tutorial into three parts.

Matplotlib

Introduction

Matplotlib is considered as the “grandfather” library of data visualization with Python. It was created by John Hunter. John Hunter created it to replicate MatLab’s plotting potential in Python. So if you are already familiar with Matlab, Matplotlib will feel natural to you.

It is a wonderful 2D and 3D graphics library for building scientific figures.

The major pros of Matplotlib are:

  • Most popular plotting library for python
  • Easy to get started for simple plots
  • Supports custom labels and text
  • High-quality output in many formats
  • Very customizable in general

Installing Matplotlib

  • We’ll need to install it with pip or conda on our command line terminal with:
pip install matplotlib

or

conda install matplotlib

If you ever find yourself stuck whether you can do a certain type of plot you can always visit this section of the official documentation and check the examples to know more about matplotlib’s capabilities. This link is probably the most helpful page that you can find on the documentation of matplotlib.

Now in the coming section, we will see how to create our own figures.

Let’s go ahead to our Jupyter Notebook to get started.

Note that I will be sharing complete .ipynb file at the end of this tutorial. So just focus on understanding the concept rather than copy-pasting these commands into your notebook.

Step 1: Importing matplotlib pyplot library

In [1]:

import matplotlib.pyplot as plt

matplotlib.pyplot is a collection of command style functions that make matplotlib work like MATLAB

You need to add the below line in order to see the plots in the notebook

In [2]:

%matplotlib inline

If you are using another editor, you’ll need to use: plt.show() at the end of all of your commands, which will open the plots as a popup window.

Basic Example

Before we begin visualisation we need some data points upon which we can perform visualization. So let’s create a simple example using two numpy arrays.

In [3]:

import numpy as np
x = np.linspace(0, 5, 11)
y = x ** 2

In[4]:

x

Out[4]:

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ])

In[5]:

y

Out[5]:

array([  0.  ,   0.25,   1.  ,   2.25,   4.  ,   6.25,   9.  ,  12.25,16.  ,  20.25,  25.  ])

Basic Matplotlib Commands

There are two ways in which we can create plots one is functional method and the other is object oriented method. We will discuss the functional method first and then we’ll shift over to a better method i.e the object oriented method.

To create a very basic line plot, follow the below commands.

In[6]:

# FUNCTIONAL method
plt.plot(x, y, 'r') # 'r' is the color red
plt.show()

Out[6]:

The plot() function here accepts three arguments- the x coordinates, the y coordinates, and the color of the line. But we should note that the plot() function is a versatile command and accepts any number of arguments.

Fun activity: Try the below commands and see what you get as the plot output :)

plt.plot(x, y)        # plot x and y using default line style and color
plt.plot(x, y, 'bo') # plot x and y using blue circle markers
plt.plot(y) # plot y using x as index array 0..N-1
plt.plot(y, 'r+') # ditto, but with red plusses

We will discuss more on linestyles and colors in the third part of this tutorial.

Let’s move on to creation of multi plots on the same canvas.

Creating multi plots on Same Canvas

In order to create a multi plot on the same canvas, we will use subplot which takes in the argument as number of rows, number of columns, and plot number that we are referring to.

In[7]:

# plt.subplot(nrows, ncols, plot_number)
plt.subplot(1,2,1)
plt.plot(x, y, 'r--') # More on color options later
plt.subplot(1,2,2)
plt.plot(y, x, 'g*-');

Notice above how we were able to create two different plots by treating the canvas as a matrix of rows and columns where each cell represents a plot.

However, the above approach is a bad practice and we would want to use a better approach to do the same that is where the more formal object-oriented approach comes into the picture which we will be covering in second part of this series.

I highly recommend you to go with the above commands on your notebook or command-line ones and try to plot on a real-world dataset and see what patterns you find through your plots.

Great! we successfully completed part 1/3 of the tutorial🎉🎉
Now let’s head on to part 2 by clicking here.

Sources

--

--

Ishank Sharma
Analytics Vidhya

Trying to bring easy access to knowledge + hacks for everyone.