12 Easy Steps to Make Clean, Readable Graphs in Python using Matplotlib and Seaborn

The Do’s and Dont’s to Get Your Point Across Visually

Will Newton
8 min readAug 10, 2020

Before I quit my job as a Data Analyst at a staffing firm and went back to school to learn Data Science, part of my job was to present KPI (key performance indicator) reports to the VP of Sales and President. These reports would encompass a huge amount of features and data (phone call data, LinkedIn Sales Navigator and Recruiter data, CRM activity data, and more) which was very hard to synthesize on one report.

This was before I had any experience in Python, Matplotlib, or Seaborn, and I think that we can all agree that the visualization tools in Excel are lacking, to put it lightly. My solution was to build a grid of metrics data that, while impressive in the total amount of information available at a glance, did little to give an impression of the overall health of the business or individual employee’s success. Below is a sample with some dummy data to give you an example of what I sent out to my employer every morning.

Now more than ever, I wished I had access to the suite of tools that Matplotlib and Seaborn give to the user when creating visualizations. Not only would I be able to automate most of my job with Python, I could also have been creating easy-to-read visualizations that would have enhanced the insights discovered in our data. Below I have outlined some easy-to-follow steps to get the most out of the Matplotlib and Seaborn libraries while starting out on your Python Visualization Quest.

Installing Packages, Importing Libraries, and Setting Options

Like most Python libraries, installing and importing the necessary libraries is made very easy by using your terminal and Jupyter Notebooks. Matplotlib comes packaged with Anaconda, a must-have for any data scientist (instructions for installation can be found here. Seaborn is a library that complements the existing Matplotlib library and can be installed using the below conda command in your terminal.

conda install -c anaconda seaborn

From here it is easy enough to import the libraries and set some key settings to ensure ease-of-use and readability. The %matplotlib inline magic function is important in that it tells your Jupyter Notebook frontend to display the visualizations underneath the code that builds it and to store the visualizations in the notebook itself. The sns.set_style() is method should also be set at the beginning of your notebook if you would like to keep the style of your visualizations consistent throughout the notebook. I like using the darkgrid style but there are a few other options to be found here if you like those better.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')

Now let’s go step by step to create the graph shown at the top of this post. Let’s create a graph that shows the change over time of domestic and worldwide gross averages by release year.

Choose Which Type of Graph To Use

Matplotlib and Seaborn come prepackaged with dozens of types of visualization out of the box that can be overwhelming in their complexity. However, just because the library gives you the option to build a complex multi-tiered visualization doesn’t mean that it will be the right visualization for the information you are trying to display. For my money, the key to engaging visualizations is to keep it simple and easy to read.

Trying to communicate too much information in one graph is a sure way to confuse your audience and give you a unnecessary headache. Think clearly about what you are trying to communicate with the visualization and which type of graph would get the point across the easiest. If it helps, think of yourself as a writer trying to communicate ideas in an essay or blog post (whoa, meta!) in the least amount of words.

Brevity is the soul of wit — William Shakespeare

In this case we are trying to communicate the change in two features over a span of twenty years. We could use two line graphs, but I think there is an advantage in clearly visualizing the gap in the two features. Since these two features are expressed on the same scale (Hundreds of Millions of Dollars (USD)), there might be some additional meaning to be mined by using a stacked bar graph instead. Let’s investigate.

Structure Data to Make Visualizations Easier

One of the first mistakes I made was to try to use the Matplotlib and Seaborn libraries on the unstructured data frame instead of doing some preprocessing in pandas. Preprocessing your data not only shows you step by step how your data is going to look once visualized, but makes it easier to write the seaborn code in the next step. Let’s load in some data to get started.

df = pd.read_csv('modeling_df.csv', index_col='Unnamed: 0')
df.head()

Since we are looking for average box office numbers by year, we can use the pd.groupby() to get our data in the right shape before graphing. We pass the column we would like to group by within the parenthesis and then chain another method to the end to indicate how we would like our numerical columns grouped. In this case, we want the average so we will pass the .mean() method to display our numerical values’ mean value.

df.groupby('year').mean()

As of this post it is 2020, so let’s remove the rows for 2020 and 2021 since that data is most likely incomplete. We can do this by slicing the data frame and saving it to another variable. This new variable dataframe is what we will pass in our visualization code.

df_mean = df.groupby('year').mean()[:-2]
df_mean

Write Your Visualization’s Code

When writing your visualization code make sure that it goes all in the same codeblock in your Jupyter Notebook. Things will break otherwise.

Setting Figsize

The first step to building a good graph is to define the figsize parameter. This can be done by calling the plt.figure() method and passing in the figsize parameter with the proper aspect ratio. The below code will create a matplotlib figure that is 15 units wide and 5 units tall.

plt.figure(figsize = (15,5))

Plotting First Bar Graph

The next thing to do is to call the sns.barplot() method and pass in the data and columns to plot. The below code will create a barplot inside the previously defined figure using the data frame we just created. It sets the x-values at the index of the data frame (which if you remember is the year) and the y-values is the average worldwide gross column. We will set the label to ‘World Wide Box Office Gross’ which will show up when we use a legend, and we will set the color to lightblue.

sns.barplot(data = df_mean, x = df_mean.index, y = 'worldwide_gross', label = 'World Wide Box Office Gross', color = 'lightblue')

Plotting Second Bar Graph

Now let’s plot another sns.barplot() on the same figure, this time for domestic box office gross, and change the color so that you can see the difference between the two figures. The code will look pretty much the same except we are changing the y-value, label and the color.

sns.barplot(data = df_mean, x = df_mean.index, y ='domestic_gross', label = 'Domestic Box Office Gross', color = 'lightgreen')

Adding a Title

Our graph is looking pretty good so far, but it definitely can be improved from a readability standpoint. Let’s start by adding a title, so that our viewers will know what they are looking at. We do this by calling the plt.title() method and passing in the text and text size.

plt.title('Box Office Gross Average by Year', size = 15)

Adding a Legend

Next let’s make sure our viewers know which bar color means what on our graph. We can do this by calling the plt.legend() method. You can pass the location of where you would like the legend to be or the size of the text, but if you leave it blank it will choose the best location and the default text size which, for our purposes, will work fine. Now we have a pretty good looking graph.

plt.legend()

Adding X-Labels and Y-Labels

We are almost there with a really great looking readable graph. However, the text labels on the x- and y-axis are not only kind of hard to read but, in the case of the y-label, completely mislabeled. Let’s change this by calling the plt.xlabel() and plt.ylabel() methods to manually set these values.

plt.ylabel('Box Office Sum in Hundreds of Millions (USD) ', size =12)
plt.xlabel('Year', size = 12)

Remember to run all that code above in one codeblock or it won’t work properly. It should look like the below code.

plt.figure(figsize = (15,5))
sns.barplot(data = df_mean, x = df_mean.index, y = 'worldwide_gross', label = 'World Wide Box Office Gross', color = 'lightblue')
sns.barplot(data = df_mean, x = df_mean.index, y ='domestic_gross', label = 'Domestic Box Office Gross', color = 'lightgreen')
plt.title('Box Office Gross Average by Year', size = 15)
plt.legend()
plt.ylabel('Box Office Sum in Hundreds of Millions (USD) ', size =12)
plt.xlabel('Year', size = 12)
plt.show()

And there you have it. In only 12 steps and a few lines of code you have made a great looking graph that gets your points across in a clear and concise manner.

--

--