Quick Guide to Labelling Data Points for Common Seaborn Plots

Make plots more readable and easily understandable

Kaili
5 min readAug 9, 2020
Photo by KOBU Agency on Unsplash

In the course of my data exploration adventures, I find myself looking at such plots (below), which is great for observing trend but it makes it difficult to make out where and what each data point is.

A line plot showing the total number of passengers yearly.
How many passengers are there in 1956?

The purpose of this piece of writing is to provide a quick guide in labelling common data exploration seaborn graphs. All the code used can be found here.

Set-Up

Seaborn’s flights dataset will be used for the purposes of demonstration.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# load dataset
flights = sns.load_dataset(‘flights’)
flights.head()
Dataframe showing the first 5 rows of the data in flights.
First 5 rows of the the data in flights

For increased ease and convenience in creating some plots, some additional data frames can be created.

# set up flights by year dataframe
year_flights = flights.groupby(‘year’).sum().reset_index()
year_flights
Dataframe showing each year and the total number of flight passengers that year.
Total number of passengers for each year
# set up average number of passengers by month dataframe
month_flights = flights.groupby(‘month’).agg({‘passengers’: ‘mean’}).reset_index()
month_flights
Dataframe showing each month of the year and the average number of flight passengers for that month.
Total number of passengers for each month

Line Plot

Plotting a graph of passengers per year:

# plot line graph
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.lineplot(x=’year’, y=’passengers’, data=year_flights, marker=’*’, color=’#965786')
ax.set(title=’Total Number of Passengers Yearly’)
# label points on the plot
for x, y in zip(year_flights[‘year’], year_flights[‘passengers’]):
# the position of the data label relative to the data point can be adjusted by adding/subtracting a value from the x &/ y coordinates
plt.text(x = x, # x-coordinate position of data label
y = y-150, # y-coordinate position of data label, adjusted to be 150 below the data point
s = ‘{:.0f}’.format(y), # data label, formatted to ignore decimals
color = ‘purple’) # set colour of line
A line plot showing the total number of passengers yearly with data labels.
Line plot showing the total number of passengers yearly.

At times, it would be preferable for the data label to be more visible, which can be achieved by adding a background colour to the data labels:

# add set_backgroundcolor(‘color’) after plt.text(‘…’)
plt.text(x, y-150, ‘{:.0f}’.format(y), color=’white’).set_backgroundcolor(‘#965786’)
A line plot showing the total number of passengers yearly with data labels that have a background colour.
Line plot showing the total number of passengers yearly.

Histogram

Plotting a histogram of the frequency of passengers on each flight:

# plot histogram 
ax = sns.histplot(flights[‘passengers’], color=’#9d94ba’, bins=10, kde=False)
ax.set(title=’Distribution of Passengers’)
# label each bar in histogram
for p in ax.patches:
height = p.get_height() # get the height of each bar
# adding text to each bar
ax.text(x = p.get_x()+(p.get_width()/2), # x-coordinate position of data label, padded to be in the middle of the bar
y = height+0.2, # y-coordinate position of data label, padded 0.2 above bar
s = ‘{:.0f}’.format(height), # data label, formatted to ignore decimals
ha = ‘center’) # sets horizontal alignment (ha) to center
Histogram showing the frequency of passengers on each flight.
Histogram showing the number of passengers on each flight.

An additional information that might be beneficial to reflect in the graph as well is the mean line of the dataset:

# plot histogram 
# …
# adding a vertical line for the average passengers per flight
plt.axvline(flights[‘passengers’].mean(), color=’purple’, label=’mean’)
# adding data label to mean line
plt.text(x = flights[‘passengers’].mean()+3, # x-coordinate position of data label, adjusted to be 3 right of the data point
y = max([h.get_height() for h in ax.patches]), # y-coordinate position of data label, to take max height
s = ‘mean: {:.0f}’.format(flights[‘passengers’].mean()), # data label
color = ‘purple’) # colour of the vertical mean line
# label each bar in histogram
# …
Histogram showing the frequency of passengers on each flight with a vertical line indicating the mean.
Histogram showing the number of passengers on each flight and a line indicating the mean.

Bar Plot

Vertical Bar Plot

Plotting the total number of passengers for each year:

# plot vertical barplot
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.barplot(x=’year’, y=’passengers’, data=year_flights)
ax.set(title=’Total Number of Passengers Yearly’) # title barplot
# label each bar in barplot
for p in ax.patches:
# get the height of each bar
height = p.get_height()
# adding text to each bar
ax.text(x = p.get_x()+(p.get_width()/2), # x-coordinate position of data label, padded to be in the middle of the bar
y = height+100, # y-coordinate position of data label, padded 100 above bar
s = ‘{:.0f}’.format(height), # data label, formatted to ignore decimals
ha = ‘center’) # sets horizontal alignment (ha) to center
Bar Plot with vertical bars showing the total number of passengers yearly.
Bar plot with vertical bars showing the total number of passengers yearly

Horizontal Bar Plot

Plotting the average number of passengers on flights each month:

# plot horizontal barplot
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.barplot(x=’passengers’, y=’month’, data=month_flights, orient=’h’)
ax.set(title=’Average Number of Flight Passengers Monthly’) # title barplot
# label each bar in barplot
for p in ax.patches:
height = p.get_height() # height of each horizontal bar is the same
width = p.get_width() # width (average number of passengers)
# adding text to each bar
ax.text(x = width+3, # x-coordinate position of data label, padded 3 to right of bar
y = p.get_y()+(height/2), # # y-coordinate position of data label, padded to be in the middle of the bar
s = ‘{:.0f}’.format(width), # data label, formatted to ignore decimals
va = ‘center’) # sets vertical alignment (va) to center
Bar plot with horizontal bars showing the average number of passengers for each month.
Bar plot with horizontal bars showing the average number of passengers for each month

Notes on Usage

It might be beneficial to add data labels to some plots (especially bar plots), it would be good to experiment and test out different configurations (such as using labels only for certain meaningful points, instead of labelling everything) and not overdo the labelling, especially if there are many points. A clean and informative graph is usually more preferable than a cluttered one.

# only labelling some points on graph# plot line graph
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.lineplot(x=’year’, y=’passengers’, data=year_flights, marker=’*’, color=’#965786')
# title the plot
ax.set(title=’Total Number of Passengers Yearly’)
mean = year_flights[‘passengers’].mean()# label points on the plot only if they are higher than the mean
for x, y in zip(year_flights[‘year’], year_flights[‘passengers’]):
if y > mean:
plt.text(x = x, # x-coordinate position of data label
y = y-150, # y-coordinate position of data label, adjusted to be 150 below the data point
s = ‘{:.0f}’.format(y), # data label, formatted to ignore decimals
color = ‘purple’) # set colour of line
A line plot showing the total number of passengers yearly.
Line plot showing the total number of passengers yearly.

Revision History

28 Aug 2022: revised histrogram code to use histplot instead of the to-be depreciated distplot

--

--