Data Visualization using Python Part-I
Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed.
Python offers multiple great graphing libraries such as Matplotlib and Seaborn that come packed with lots of different features, let us spend the next few minutes exploring just that!
Data is only as good as it’s presented — Source
Matplotlib
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on Numpy arrays and designed to work with the broader SciPy stack. It is the brainchild of John Hunter.
Matplotlib Installation
Can be done on your local machine via Python command prompt
python -m pip install -U pip
python -m pip install -U matplotlib
Importing Matplotlib
from matplotlib import pyplot as plt
#or
import matplotlib.pyplot as plt
General Concepts in Matplotlib
A Matplotlib figure can be categorized into several parts as follows —
Figure: It is a whole figure which may contain one or more than one axes (plots). You can think of a Figure as a canvas which contains plots.
Axes: It is what we generally think of as a plot. A Figure can contain many Axes. It contains two or three (in the case of 3D) Axis objects. Each axis has a title, an x-label and a y-label.
Axis: They are the number line like objects and take care of generating the graph limits.
Artist: Everything which one can see on the figure is an artist like Text
objects, Line2D
objects, collection
objects. Most Artists are tied to Axes.
Getting Started with Pyplot
Making a simple plot
import matplotlib.pyplot as plt
import numpy as np
A few points to be noted —
- We pass two arrays as our input arguments to Pyplot’s
plot()
method and useshow()
method to invoke the required plot. - Here note that the first array appears on the x-axis and second array appears on the y-axis of the plot.
- Now that our first plot is ready, let us add the title, and name x-axis and y-axis using methods
title()
,xlabel()
andylabel()
respectively.
We can also specify the size of the figure using method figure()
and passing the values as a tuple of the length of rows and columns to the argument figsize
as illustrated in the image below.
We can also plot multiple sets of data by passing in multiple sets of arguments of X and Y-axis using plot()
as shown —
Different Visualizations using Pyplot
Bar Plots
Pyplot provides a method bar()
to make bar graphs which take arguments: categorical variables, their values and colour (if you want to specify).
You can also make horizontal bar graphs using the method barh()
Also, we can pass an argument (with its value)xerr
oryerr
(in case of the above vertical bar graphs) to depict the variance in our data as follows —
To create horizontally stacked bar graphs we use the bar()
method twice and pass the arguments where we mention the index and width of our bar graphs in order to horizontally stack them together.
Also, notice the use of two other methods
legend()
which is used to show the legend of the graph andxticks()
to label our x-axis based on the position of our bars.
Similarly, to vertically stack the bar graphs together, we can use an argument bottom
and mention the bar graph which we want to stack below as its value.
Pie Chart
A Pie Chart can be made using the method pie()
We can also pass in arguments to customize our Pie chart to show the shadow, explode a part of it, tilt it at an angle as follows —
Histogram
Histograms are a special form of bar chart where the data represent continuous rather than discrete categories. This means that in a histogram there are no gaps between the columns representing the different categories.
Histograms can be achieved using Matplotlib!
Scatter Plots
Scatter plots are widely used graphs, especially they come in handy in visualizing a problem of regression.
In the following example, we fed in arbitrarily created data of height and weight and plot them against each other. We can use xlim()
and ylim()
methods to set the limits of X-axis and Y-axis respectively.
3-D Plotting
The above scatter can also be visualized in three dimensions. To use this functionality, we first import the module mplot3d
as follows —
from mpl_toolkits import mplot3d
Once the module is imported, a three-dimensional axes is created by passing the keyword projection='3d'
to the axes()
method of Pyplot module. Once the object instance is created, we pass our arguments height and weight to scatter3D()
method.
We can also create 3-D graphs of other types like line graph, surface, wireframes, contours, etc. The above example in the form of a simple line graph is as follows: Here instead of scatter3D()
we use method plot3D()
I know that’s a lot to take in at once! But you made it until the end! Kudos on that!
Additional Resources
Matplotlib has been around for a while and there are a lot of other good resources if you’re still interested in getting the most out of this library.
For complete code, visit the following link —
Also, do not forget to go through — Matplotlib Documentation and the Part -II of this Blog.