Fast and Easy Plotting with Matplotlib

Dan Watson
Hardwood Convergence
6 min readJul 25, 2019
Photo by Isaac Smith on Unsplash

The Plan

Last post we learned how to make basic plots with pandas. Those plots were fine for exploratory purposes, but they’re probably not something you’d want to share with a larger audience. In this post, we’ll improve upon these plots by using matplotlib- the most popular python charting library and a must know for any data scientist. Here’s our plan:

  • Set a quick and painless foundation of good graphing practices
  • Learn how to plot a single series
  • Plot multiple series against each other on the same graph

Graphing Theory: Sometimes Less is More

Check out the following to graphs:

Sources: Left: DataScienceCentral, Right: USA Today

Which is better? It takes just a second, but we all intuitively feel that the smartphone graph on the right is better. The left graph feels overwhemling. It has too much ink, takes too long to understand, and doesn’t convey a clear story. The graph on the right requires just a quick glance to understand that smartphone ownership more than doubled from 2011 to 2016. Perfect.

What makes a graph good? There are a ton of great resource on graphing theory ( I’d recommend The Visual Display of Quantitative Information by Edward Tufte), but the main points are quite intuitive:

  • Know your story
  • Encourage your reader to think about substance rather than graphical design
  • Show the data to encourage comparison
  • Maximize the data to ink ratio
  • Integrate statistics with verbal
  • Don’t distort the data
  • Never use pie charts

If you can’t remember all that, just remember to be clear and honest. And never use pie charts. Enough theory, let’s code.

Introducing Matplotlib, the Honda Accord of Charting Libraries.

While not the flashiest package available, or containing the most state of the art features, matplotlib is reliable, easy to use, and everyone has it. It’ll get the job done in 95% of charting scenarios and you need to know how to use it. Let’s first make a basic plot.

To start, we need to import the pyplot from matplotlib- convention here is to import it into notebooks like this:

import matplotlib.pyplot as plt
%matplotlib inline

You saw this in the last post, but we’ll quickly explain it. Pyplot is the collection of functions that allow us to create plots in python. We then call the magic function- %matplotlib inline — that sets the backend of matplotlib output graphs in our notebooks instead of in a separate window.

Let’s make an easy first plot. A simple line chart based on fake data:

Basic line plot in matplotlib
Basic line plot in matplotlib

We initialized two lists of data- x and y. Then from pyplot we called a plot and fed it x and y. Then we call the show() method on plt and our graph is output. Pretty simple.

Let’s load our dataset and plot Harden’s points against the game number:

Plot of points vs game number
Plot of points vs game number

We followed the same conventions as above, just limiting the dataset to games in which Harden actually played. The graph returned is okay for us, but another viewer wouldn’t know what we’re trying to convey. Let’s add labels to our axes and a title:

Plot of points vs game number with labels
Plot of points vs game number with labels

You can see that we just added an xlabel(), ylabel(), and title() to our plot and the graph now it’s starting to convey some information that we can use to tell a story.

Just plotting the points by themselves is interesting, but let’s add another layer to the chart. We’ll add a bar chart to indicate the number of field goals attempted. We should also resize the plot so it’s not so square and add some color to make the graph stand out. Let’s also add a legend to explain what our different plots mean. Here’s our code:

Plot of pts and fga against game number
Plot of pts and fga against game number

Starting with the top line, we called plt.figure() and set a new figure size. This takes a tuple of values that indicate the new width and height of the figure. We made this figure a rectangle to reduce the vertical skew of our data and allow us to differentiate between the bars. Next we created a series of shots and added a bar plot to our plot. You also see that we added a few more arguments to these calls. The color argument takes either a name, a single letter value, or a hex value to color the plot. We’ll make the points line black and the field goal attempts bar plot red. We also added a label argument which allows matplotlib attaches to the series plotted. When we call our plt.legend(), matplotlib will match the colors and the label names and create the legend. Finally, you see we also added a linewidth to our line plot to make sure the point values aren’t overpowered by the bar chart- play with these values to see what looks good to you.

Let’s finish up by learning making a histogram, which is something you’ll use frequently to see the distribution of a dataset. Here’s how we can make a simple histogram of Harden’s points scored per game:

Simple histogram of points scored for James Harden 2018–19
Simple histogram of points scored for James Harden 2018–19

All we had to do was call the plt.hist() function and pass in the our points series variable. That’s easy enough, but let’s add a few more arguments to make this more presentable. Let’s add our own bins and put some space between the bars to make it look less like a glob:

Better histogram of Harden’s points 2018–19
Better histogram of Harden’s points 2018–19

We created a bins variable with a list comprehension that takes values from 0 to 70 by 10. Harden’s highest scoring game last season was 61 points, so this covers all his games. We then pass the bins to the histogram call and then set the rwidth of the bars to be 95% of the width of that bar to add some space between the plotted bars. Play with the bin widths and bin values to find out what is appealing to you, but this seems better for now.

Wrapping Up

Learning just the basics of matplotlib gives us a lot more freedom and flexibility in creating visually pleasing plots that help us convey our stories. We have just scratched the surface of matplotlib so check out the documentation to learn more about what this library has to offer. If you want to download the code from this tutorial, you can find it here on our github.

I was planning on showing off some interactive plotting in the next post, but I’m over working with just Harden’s 2018–19 game logs so I’m sure you are too. Next we’ll explore another way to get NBA data with python and we’ll start to ramp up the size our data.

As always, thanks for reading and leave any questions, comments, suggestions below or let me know any topics you would like to see covered! If you learned anything from this post, show some love and smash that clap!

--

--

Dan Watson
Hardwood Convergence

Data nerd, basketball fanatic, and ice cream connoisseur. Health care analytics by day, basketball analytics by night. https://www.linkedin.com/in/danielkwatson