A step-by-step guide to QUICK and ELEGANT graphs using python

Nerdy2mato
6 min readApr 8, 2019

--

Summary: Matplotlib and Seaborn are the most popular visualization libraries when it comes to visualizing your dataset using python. Matplotlib allows you to have low-level control over every component of your graph, while seaborn enables you to create sophisticated graphs with a few lines of code. This article shows you how to make elegant graphs quickly using both libraries.

Whether you are doing data analysis or prototyping machine learning models, knowing how to visualize the dataset is a very useful skill to have. However, it is not enough, being able to visualize QUICKLY is worth even more. Why? Because it allows you to test different assumptions and prototype when your ideas are still fresh.

Everything starts from an empty canvas

Imagine we are given a piece of paper to draw some graphs on it. The paper is your canvas, it is also called Figure. A figure can include one or multiple graphs, they are called Axes. Each Axes may come with titles, x-axis, y-axis, and legend etc.

Matplotlib document has a great visualization which explains what each component is called.

Let’s explain using examples

1. Choose your preferred style

Matplotlib comes with more than 20 different style sheets. They live in matplotlib.style module. It is very convenient to choose one or more styles for your visualisation project to have a coherent effect.

import matplotlib.style
print(plt.style.available) # see what styles are available
mpl.style.use('fivethirtyeight') # choose a preferred style

These are the available styles:

['seaborn-dark', 'seaborn-darkgrid', 'seaborn-ticks', 'fivethirtyeight', 'seaborn-whitegrid', 'classic', '_classic_test', 'fast', 'seaborn-talk', 'seaborn-dark-palette', 'seaborn-bright', 'seaborn-pastel', 'grayscale', 'seaborn-notebook', 'ggplot', 'seaborn-colorblind', 'seaborn-muted', 'seaborn', 'Solarize_Light2', 'seaborn-paper', 'bmh', 'tableau-colorblind10', 'seaborn-white', 'dark_background', 'seaborn-poster', 'seaborn-deep']

2. Every component can be customised including figure size, font size, number of axes, line width and marker size etc

Matplotlib comes with a configure file, the configure file has default settings for each component of a plot. After you select a style, you can manually over-write the default setting for a particular component by assigning it a new value.

# a few examplesimport matplotlib as mplmpl.rcParams['font.size'] = 10  # customise font size of a particular graph title, x-axis ticker and y-axis tickermpl.rcParams['legend.fontsize'] = 10 # customise legend sizempl.rcParams['figure.titlesize'] = 15 # customise the size of suptitlempl.rcParams['lines.markersize'] = 10mpl.rcParams['legend.markerscale'] = 0.5mpl.rcParams['lines.markeredgewidth']  : 4

2. Decide how big your canvas (figure) is and how many plots you want to place on it

You can draw multiple axes in one figure, the example below draws 4 axes in one single canvas with 2 plots on the first row and the other 2 on the second row. Also, the entire figure (canvas) size is 25 * 15.

# create figure, define figure size, create an empty canvas (axes) and add tile to the graphrows = 2
columns = 2
fig, axes = plt.subplots(rows,columns,figsize=[25,15])

3. Let’s make some plots using seaborn

I prefer using seaborn as much as I can because it allows me to draw sophisticated plots with a few lines of code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # for plotting
import matplotlib.style
import matplotlib as mpl # to customise all the components in the graph, including font, size, markers, lines, labels etc
import matplotlib.gridspec as gridspec # to make sure suptitle is not overlapping with title
import seaborn as sns%matplotlib inline# test datasample (you can use your own dataset)tips = sns.load_dataset("tips") # get some sample data
print(tips.head(10))
print(tips.dtypes)
# draw a scatter plot using seaborn with one line of code
sns.scatterplot(x = 'total_bill',y = 'tip',hue = 'sex', data = tips)
plt.show()

If you run this code in your own terminal, the result will look like this, not the best visualisation.

4. Using Matplotlib to fine tune each component of your graph

After we make our first plot, we can fine tune each part of the graph, including figure size, the location of legend, x-axis and y-axis labels, ticks and tick labels etc using the Axes object in matplotlib.

# create figure, define figure size, create an empty canvas (axes) and add tile to the graph
fig, ax = plt.subplots(figsize=[8,4],frameon = False) # setup a single figure and define an empty axes
ax.get_title(loc = "center")
ax.set_title('Total bill v Tips by gender') # give a title to the graph
plt.tight_layout() # avoid overlapping ticklabels, axis labels, and titles (can not control suptitle)
# same code as before to create a basic plot
sns.scatterplot(x = 'total_bill',y = 'tip',hue = 'sex', data = tips, ax = ax)
# customise each component manuallyax.set_xlabel('Total bill ($)') # give a label name to the x axis
ax.set_ylabel('Tip ($)') # give a label name to the x axis
ax.set_xlim(0,60) # adjust x axis range for numeric input
ax.set_ylim(0,12) # ajust y axis range for numeric input
ax.set_xticks(np.arange(0, 60 + 1, 5)) # adjust the x tick frequency
ax.set_yticks(np.arange(0, 12 + 1, 1)) # adjust the y tick frequency
ax.legend(bbox_to_anchor=(0.99, 0.6)) #customise the legend locationplt.show()

Here is the output, much better and easier to see.

5. The same grammar can be applied to make all kinds of graphs

The scatter plot is just an example, seaborn comes with all sorts of graphs for different visualisation and data types.

Here is the code for making the graphs above

Step 1: pick your style and reset some parameters

# Pick a style for your plot and choose your default fond size
mpl.style.use('fivethirtyeight')
mpl.rcParams['font.size'] = 20 # customise font size of a particular graph title, x-axis ticker and y-axis ticker
mpl.rcParams['legend.fontsize'] = 20 # customise legend size

Step 2: decide on figure and axes (same as the previous example)

# create figure, define figure size, create an empty canvas (axes) and add tile to the graph
rows = 2
columns = 2
fig, axes = plt.subplots(rows,columns,figsize=[25,15],frameon = False) # setup a figure and define 6 empty axes (2 rows and 2 columns)

Step 3: since we are plotting multiple plots on the same canvas, we need to make sure plots are not overlapping

# To adjust subplot positions and avoid overlapping on components and between subplots
plt.tight_layout() # avoid overlapping ticklabels, axis labels, and titles (can not control suptitle)
plt.subplots_adjust(left=None, bottom=None, right=None, top= 0.8, wspace=0.3 , hspace=0.3)

Step4: plot your first plot and fine-tune some of the components

# assign a name to each ax
d = {}
i = 0
for r in range(rows):
for c in range(columns):
d[i] = axes[r][c]
i += 1
# plot a scatter plot
g0 = sns.scatterplot(x = 'total_bill',y = 'tip',hue = 'sex', data = tips, ax = d[0])
g0.set_title('Total bill v Tips by gender')
g0.set_xlabel('Total bill ($)') # give a label name to the x axis
g0.set_ylabel('Tip ($)') # give a label name to the x axis
g0.set_xlim(0,60) # adjust x axis range for numeric input
g0.set_ylim(0,12) # ajust y axis range for numeric input
g0.set_xticks(np.arange(0, 60 + 1, 5)) # adjust the x tick frequency
g0.set_yticks(np.arange(0, 12 + 1, 1)) # adjust the y tick frequency
#g0.legend(bbox_to_anchor=(0.99, 0.6)) #customise the legend location

Step 5: finish plotting all the rest three plots using the same structure and grammar as Step 4

# plot a frequency plot
g1 = sns.distplot(tips['total_bill'], ax = d[1], label = 'total bill %')
g1.set_title('Total bill histogram')
g1.set_xlabel('Total bill ($)') # give a label name to the x axis
g1.set_ylabel('Percentage (%)') # give a label name to the x axis
g1.set_xlim(0,70) # adjust x axis range for numeric input
g1.set_ylim(0,0.1) # ajust y axis range for numeric input
g1.legend()
#g1.legend(bbox_to_anchor=(0.99, 0.6))
# plot a box plot
g2 = sns.boxplot(x="smoker", y="total_bill", hue="day",data=tips, ax = d[2])
g2.set_title('Total bill by smoker by day')
g2.set_xlabel('is Smoker?') # give a label name to the x axis
g2.set_ylabel('total bill ($)') # give a label name to the x axis
g2.legend()
#g2.legend(bbox_to_anchor=(0.1, 0.6))
# plot another box plot
g3 = sns.boxplot(x="day", y="total_bill", hue="time",data=tips, ax = d[3])
g3.set_title('Total bill by day by time')
g3.set_xlabel('Day of the week') # give a label name to the x axis
g3.set_ylabel('total bill ($)') # give a label name to the x axis
g3.legend()
#g3.legend(bbox_to_anchor=(0.1, 0.6))
plt.show()

The full script can be downloaded from this github repo

--

--

Nerdy2mato

Data scientist, Writer, Home organization enthusiast, Always looking for new things to learn