Advanced Visualization for Data Scientists with Matplotlib

Contents: Basic plots, 3D plots and widgets

A picture is worth a thousand words but a good visualization is worth millions.

Visualization plays a fundamental role in communicating results in many fields in today’s world. Without proper visualizations, it is very hard to reveal findings, understand complex relationships among variables and describe trends in the data.

In this blog post, we’ll start by plotting the basic plots with Matplotlib and then drill down into some very useful advanced visualization techniques such as “The mplot3d Toolkit” (to generate 3D plots) and widgets.

The Vancouver property tax report dataset has been used to explore different types of plots in the Matplotlib library. The dataset contains information on properties from BC Assessment (BCA) and City sources including Property ID, Year Built, Zone Category, Current Land Value, etc.

A Link to the codes is mentioned at the bottom of this blog.

Matplotlib Basic Plots

Frequently used commands in the given examples:

plt.figure(): To create a new figure
plt.plot(): Plot y versus x as lines and/or markers
plt.xlabel(): Set the label for the x-axis
plt.ylabel(): Set the label for the y-axis
plt.title(): Set a title for the axes
plt.grid(): Configure the grid lines
plt.legend(): Place a legend on the axes
plt.savefig(): To save the current figure on the disk Display a figure
plt.clf(): Clear the current figure(useful to plot multiple figures in the same code)

1. Line Plot

A line plot is a basic chart that displays information as a series of data points called markers connected by straight line segments.

The above code snippet can be used to create a line graph. Here, Pandas Dataframe has been used to perform basic data manipulations. After reading and processing the input dataset, plt.plot() is used to plot the line graph with Year on the x-axis and the Number of properties built on the y-axis.

2. Bar Plot

A bar graph displays categorical data with rectangular bars of heights or lengths proportional to the values which they represent.

The above code snippet can be used to create a Bar graph.

3. Histogram

A histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable.

The above code snippet can be used to create a Histogram.

4. Pie Chart

A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportions. In a pie chart, the arc length of each slice is proportional to the quantity it represents.

The above code snippet can be used to create a Pie chart.

5. Scatter Plot

The above code snippet can be used to create a Scatter plot.

6. Working with Images

Link to download the Lenna test image. (Source: Wikipedia)

3D Plots using Matplotlib

3D plots play an important role in visualizing complex data in three or more dimensions.

1. 3D Scatter Plot

3D scatter plots are used to plot data points on three axes in an attempt to show the relationship between three variables. Each row in the data table is represented by a marker whose position depends on its values in the columns set on the X, Y, and Z axes.

2. 3D Line Plot

3D Line Plots can be used in the cases when we have one variable that is constantly increasing or decreasing. This variable can be placed on the Z-axis while the change of the other two variables can be observed in the X-axis and Y-axis w.r.t Z-axis. For example, if we are using time series data (such as planetary motions) the time can be placed on Z-axis and the change in the other two variables can be observed from the visualization.

3. 3D Plots as Subplots

The above code snippet can be used to create multiple 3D plots as subplots in the same figure. Both the plots can be analyzed independently.

4. Contour Plot

The above code snippet can be used to create contour plots. Contour plots can be used for representing a 3D surface on a 2D format. Given a value for the Z-axis, lines are drawn for connecting the (x,y) coordinates where that particular z value occurs. Contour plots are generally used for continuous variables rather than categorical data.

5. Contour Plot with Intensity

The above code snippet can be used to create filled contour plots.

6. Surface Plot

The above code snippet can be used to create Surface plots which are used for plotting 3D data. They show a functional relationship between a designated dependent variable (Y), and two independent variables (X and Z) rather than showing the individual data points. A practical application for the above plot would be to visualize how the Gradient Descent algorithm converges.

7. Triangular Surface Plot

The above code snippet can be used to create Triangular Surface plot.

8. Polygon Plot

The above code snippet can be used to create Polygon Plots.

9. Text Annotations in 3D

The above code snippet can be used to create text annotations in 3D plots. It is very useful when creating 3D plots as changing the angles of the plot does not distort the readability of the text.

10. 2D Data in 3D Plot

The above code snippet can be used to plot 2D data in a 3D plot. It is very useful as it allows to compare multiple 2D plots in 3D.

11. 2D Bar Plot in 3D

The above code snippet can be used to create multiple 2D bar plots in a single 3D space to compare and analyze the differences.

Widgets in Matplotlib

So far we have been dealing with static plots where the user can only visualize the charts or graphs without any interaction. However, widgets provide this level of interactivity to the user for better visualizing, filtering and comparing data.

1. Checkbox widget

As you can see from the above graph, Matplotlib allows the user to customize which graph to show with the help of checkboxes. This can be particularly useful when there are many different categories making comparisons difficult. Hence, widgets make it easier to isolate and compare distinct graphs and reduce clutter.

2. Slider widget to control the visual properties of plots

Matplotlib slider is very useful to visualize variations of parameters in graphs or mathematical equations. As you can see, the slider enables the user to change the values of the variables/parameters and view the change instantly.

Where to go from here?

If you are interested in exploring more interactive plots with modern design aesthetics, we recommend checking out Dash by Plotly.

This is it, folks. I hope you find this post useful. The full code (Jupyter Notebook and Python files) can be found here. Due to the limitations of Jupyter Notebook, the interactive plots (3D and widget) do not work properly. Hence, the 2D plots are provided in a Jupyter Notebook and the 3D and widget plots are provided as .py files.

Feel free to leave your comments below.



Gaurav Prachchhak, Tommy Betz, Veekesh Dhununjoy, Mihir Gajjar.