A Guide To Create Dumbbell Charts In Python

Shivank Batra
5 min readOct 3, 2021

--

In this tutorial I will be walking you through the process of creating dumbbell charts in Python using data analysis and plotting libraries: Pandas, Numpy and Matplotlib. Some basic knowledge of python, pandas, numpy and matplotlib is required. This guide will give you an overview of the methodology followed by me so I won’ t be explaining each and every piece of code. If you get stuck or are unable to understand any piece of code, you can refer to lot of awesome websites such as stackoverflow or geeksforgeeks. If your problem is still not solved, you can leave you comment below and I will try to help.

The notebook containing the whole code and the data file is available here: https://github.com/Shivank12batra/dumbbell-chart-plot and also on this open source community website: https://sharmaabhishekk.github.io/mpl-footy/contribution-guide.html (Do check other tutorial guides from this website as well if you are just getting started into football analytics or Python/data analytics in general). Now, let’s start with our tutorial!

But What Exactly Are Dumbbell Charts?

A dumbbell plot(also called a barbell plot) is ideal for illustrating change and comparing the distance between two groups of data points. It looks like a dumbbell, hence the name.

In the example used in this tutorial, I have visualized the top two goalscorers of each team in La Liga for the season 20/21 through the use of Dumbbell Charts. The end product will look something like this:

Dumbbell Chart For LaLiga 20/21 Top Goalscorers

Now let’s jump right into the code!

Importing The Necessary Libraries

The first step before we begin is to import all the necessary libraries that will be required in the process ahead. Pandas and Numpy will be used for data pre-processing and Matplotlib will be used for plotting our dumbbell chart.

Data Pre-processing

The first step will be loading the data. The dataset which I have used here is the shots data scraped from https://understat.com/. The csv file is available in the github repo link which I have provided at the beginning of the article.

Btw, if you want to make the chart for a different league and don’t know how to scrape data, I have created this easy to use library known as understatscraper which scrapes data for either a single or a whole season(for a specific league/team/player). You can check the documentation on how to use its API calls.

The first step will be altering the ‘player’ column to only store their second names(or only the first one for player names with length one.) We are doing this so that our chart does not get cluttered with player names overlapping with each other.

We will apply the lambda function to the whole column. If length of player name is greater than one, the function will split the name into a list and then only store the second one as the player name. If the length is one then no changes will be made:

The code output:

Now, we are going to store all the teams in a list. We are creating a list so that we can loop over it and plot the dumbbells plot for each team one by one.

Code Output:

Initial Plot Settings

Now, we will be setting up our initial plot using the awesome Matplotlib library.

The above is our initial plot setup. Now, we will be removing the spines and adding labels to both axis ticks + the y axis.

Great work! We are slowly getting there. Now, our next steps will be to loop over the teams list that we created earlier and plot the dumbbell lines.

A Bit Of Data Manipulation

We won’t start with the plotting part just yet since we need to manipulate and get the dataframe in our desired form for each team. This is how we are going to do it:

Let’s do a walkover of the code step by step:

First, we start with looping over our teams list then we filter the dataframe for the specific team.

After the team filtering is done, we further filter out for only goals in the “result” column so that we get the columns which only contains goals as the final result. Then we groupby by the “player” column name and use the agg and the sort_values function to show all the goalscorers of that team corresponding to their number of goals scored in descending order. And then we are done!

Here what our dataframe for a specific team will look like:

Final Part: Main Plotting

Now we can finally start our main plotting after we do the necessary data manipulation and get the desired dataframe output.

We have used the scatter() method to plot the two scatter points which will represent the top two goalscorers for each team. To connect the two scatter points, we use the plot() method to create a line connecting the two. text() method is used to plot the player’s name near the scatter points and the number of goals scored inside the scatter point.

Note, that the whole plotting code is inside the loop alongside the previous data manipulation one.

Now, we will make some more additions and our final plot will be ready!

The code is pretty much self-explanatory. We have set the y limits of the plot using the plt.ylim() function call, setup a grid with low transparency and added the text for title and some more text at the bottom of the chart. Then finally, we use the savefig() function call to save our dumbbell plot.

And we have our final product

Hope you enjoyed this tutorial and were able to learn something new. Have a great day!

--

--