Data Visualization for absolute beginners [part 1/3]
The job of a Data Scientist has been ranked the number one job by glassdoor and it is an excellent career path providing not just great salary potential but an opportunity to work on some of the world’s most interesting problems!
As a matter of fact, data visualization is one of the most in-demand skill for any data scientist out there.
Data visualization in simple terms means the representation of data in a visual format like the graph, chart, etc. We need data visualization because a visual output is always easy to understand and proves to be a much better alternative over text-based data.
Motivation
Being an active user of matplotlib (a wonderful library to plot data) , I often find it difficult to follow along with the documentation, and I always had to struggle and rely on StackOverflow to do even the basic stuff.
Hence I decided to write down this basic tutorial to help newbies get started with this library and become better at it.
So let’s get started!
In this article, we will try to get a basic understanding of python library called Matplotlib. We will start off with some basic examples and learn the most fundamental commands of Matplotlib then we will gradually move to concepts of Object-Oriented Method followed by learning about labeling our plots.
For simplicity purposes, we will divide this tutorial into three parts.
- Matplotlib for basic visualizations
- Matplotlib Object-Oriented API
- Legends Labels and Titles for making the plots readable
Matplotlib
Introduction
Matplotlib is considered as the “grandfather” library of data visualization with Python. It was created by John Hunter. John Hunter created it to replicate MatLab’s plotting potential in Python. So if you are already familiar with Matlab, Matplotlib will feel natural to you.
It is a wonderful 2D and 3D graphics library for building scientific figures.
The major pros of Matplotlib are:
- Most popular plotting library for python
- Easy to get started for simple plots
- Supports custom labels and text
- High-quality output in many formats
- Very customizable in general
Installing Matplotlib
- We’ll need to install it with pip or conda on our command line terminal with:
pip install matplotlib
or
conda install matplotlib
If you ever find yourself stuck whether you can do a certain type of plot you can always visit this section of the official documentation and check the examples to know more about matplotlib’s capabilities. This link is probably the most helpful page that you can find on the documentation of matplotlib.
Now in the coming section, we will see how to create our own figures.
Let’s go ahead to our Jupyter Notebook to get started.
Note that I will be sharing complete .ipynb file at the end of this tutorial. So just focus on understanding the concept rather than copy-pasting these commands into your notebook.
Step 1: Importing matplotlib pyplot library
In [1]:
import matplotlib.pyplot as plt
matplotlib.pyplot
is a collection of command style functions that make matplotlib work like MATLAB
You need to add the below line in order to see the plots in the notebook
In [2]:
%matplotlib inline
If you are using another editor, you’ll need to use: plt.show() at the end of all of your commands, which will open the plots as a popup window.
Basic Example
Before we begin visualisation we need some data points upon which we can perform visualization. So let’s create a simple example using two numpy arrays.
In [3]:
import numpy as np
x = np.linspace(0, 5, 11)
y = x ** 2
In[4]:
x
Out[4]:
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])
In[5]:
y
Out[5]:
array([ 0. , 0.25, 1. , 2.25, 4. , 6.25, 9. , 12.25,16. , 20.25, 25. ])
Basic Matplotlib Commands
There are two ways in which we can create plots one is functional method and the other is object oriented method. We will discuss the functional method first and then we’ll shift over to a better method i.e the object oriented method.
To create a very basic line plot, follow the below commands.
In[6]:
# FUNCTIONAL method
plt.plot(x, y, 'r') # 'r' is the color red
plt.show()
Out[6]:
The plot() function here accepts three arguments- the x coordinates, the y coordinates, and the color of the line. But we should note that the plot() function is a versatile command and accepts any number of arguments.
Fun activity: Try the below commands and see what you get as the plot output :)
plt.plot(x, y) # plot x and y using default line style and color
plt.plot(x, y, 'bo') # plot x and y using blue circle markers
plt.plot(y) # plot y using x as index array 0..N-1
plt.plot(y, 'r+') # ditto, but with red plusses
We will discuss more on linestyles and colors in the third part of this tutorial.
Let’s move on to creation of multi plots on the same canvas.
Creating multi plots on Same Canvas
In order to create a multi plot on the same canvas, we will use subplot which takes in the argument as number of rows, number of columns, and plot number that we are referring to.
In[7]:
# plt.subplot(nrows, ncols, plot_number)
plt.subplot(1,2,1)
plt.plot(x, y, 'r--') # More on color options later
plt.subplot(1,2,2)
plt.plot(y, x, 'g*-');
Notice above how we were able to create two different plots by treating the canvas as a matrix of rows and columns where each cell represents a plot.
However, the above approach is a bad practice and we would want to use a better approach to do the same that is where the more formal object-oriented approach comes into the picture which we will be covering in second part of this series.
I highly recommend you to go with the above commands on your notebook or command-line ones and try to plot on a real-world dataset and see what patterns you find through your plots.
Great! we successfully completed part 1/3 of the tutorial🎉🎉
Now let’s head on to part 2 by clicking here.