How to Visualize the Formula 1 Championship in Python Using the Ergast API and Seaborn
For those wanting to get into Formula 1 data analysis, the Ergast API is a very good starting point. It provides you with clean, easily accessible data that can relatively easily be processed. This tutorial will show you how to use data from the Ergast API to visualize the changes in the 2021 championship standings over the rounds, which will result in the image above.
Also, this tutorial will show you how to create plots with Seaborn, a Python library that is based on Matplotlib. However, please note that the main focus of this tutorial is on how to process data from the Ergast API into a basic visualization, and not on how to create the most beautiful plots.
If you’re ready to go, let’s dive right in! If you, however, have no idea how to begin with analyzing data in Python, check out my previous tutorial to get ready first:
Setting everything up
First of all, as always, we want to import all the libraries we need. Don’t forget to pip install
fastf1 , which we will both need for this tutorial.
Retrieving data from the Ergast API
If you’re completely new to doing these types of things in Python, I have some good news for you: retrieving data from Ergast is really easy. And with really easy, I do mean really really easy. All you need is the following:
As you can see, we created a method called
ergast_retrieve() . Ergast has many different endpoints (e.g. for championship standings, race results, qualifying results, et cetera), which I really recommend you to check out here! To get the data from a specific endpoint, all you have to do is specify the endpoint and its parameters as a parameter for this method. It will then create a request to the Ergast API and return a dictionary containing the results you want.
Collecting the data
So, we’re interested in championship standings per round. Ergast has an endpoint for that, which allows us to specify the season and the desired round. The endpoint looks like the following:
If we’re interested in, let’s say, the driver standings after 5 rounds in the 2021 season, we can make a request to the following endpoint (click on it to check it out!):
We do, however, need data from all the rounds. So, we need to make the same request multiple times by looping through all the rounds. That will look as follows:
As you can see, we loop through all the rounds one by one. At the time of writing, round 18 of the 2021 World Championship has just been completed, so we’re interested in seeing the progress up until then.
I highly recommend you to familiarize yourself with the data that’s being returned from Ergast. You can, for example, run
ergast_retrieve('current/1/driverStandings') and see what the returned data looks like. The chunk of code above gets the driver standings from the return, which is a driver-by-driver list with their championship position and their amount of points. Since we’re interested in their championship position, we loop through all the drivers and store their standing at that specific round in the variable
current_round , which we then append to our final DataFrame called
all_championship_standings . Again, I highly recommend you to play around with all the variables to get an idea of what is going on!
Preparing the data for plotting
For those familiar with doing data analysis, you’ll probably know that data almost always needs some transformations before it can be analyzed in the way you want it. Also in this case we need to do apply small transformation to the data before it can be plotted the way we want it.
Here, we’re melting the dataset based on the round. This will convert the dataset from wide to long, were multiple columns (in this case the round and the driver) will work as identifiers. The result of the melting looks like this:
Plotting the data
Now we have the data in the format we want it in, we can generate the final plot. I’ll put the entire code first, and explain what it does afterwards.
So, let me explain what’s going on!
- [Line 1–8]: Set up a few basics, like defining the size of the plot and setting the title.
- [Line 10–18]: This is a loop that loops through all the drivers. For every driver, a Seaborn (
sns) lineplot is created. The x-value is the round, and the y-value is the position in the championship. The standing is called
valuebecause, if you look at the melted dataset, the column containing the standing is just called
value. The color of the line (line 17) is generated from the fastf1 helper function
team_color, which can provide us with us the color codes of all the teams.
- [Line 20]: We invert the y-axis since otherwise the driver leading the championship would be at the bottom.
- [Line 23–29]: We tweak the x- and y-axes to make sure that they display all the numbers of the rounds and championship positions, and we add labels to the axes.
- [Line 32]: We disable the gridlines since they add no value to the plot and actually only create chaos.
- [Line 35–62]: Here is some cool stuff happening. To make the plot a little bit nicer, I really wanted to have the driver’s name at the end of the line instead of having a legend with confusing colors. To achieve that, we start to loop through all the lines on line 36. We get the last coordinates of the line (which is the end of the line), and then place the text right after.
Let me, just like at the beginning of the article, show you the result of this code:
Pretty neat, right?! Even though the plot looks relatively simple, it still requires some steps to get it in the exact way we like it.