How to Visualize the Formula 1 Championship in Python Using the Ergast API and Seaborn

The outcome of this tutorial will look like this.

For those wanting to get into Formula 1 data analysis, the Ergast API is a very good starting point. It provides you with clean, easily accessible data that can relatively easily be processed. This tutorial will show you how to use data from the Ergast API to visualize the changes in the 2021 championship standings over the rounds, which will result in the image above.

Also, this tutorial will show you how to create plots with Seaborn, a Python library that is based on Matplotlib. However, please note that the main focus of this tutorial is on how to process data from the Ergast API into a basic visualization, and not on how to create the most beautiful plots.

If you’re ready to go, let’s dive right in! If you, however, have no idea how to begin with analyzing data in Python, check out my previous tutorial to get ready first:

Setting everything up

First of all, as always, we want to import all the libraries we need. Don’t forget to pip install seaborn and fastf1 , which we will both need for this tutorial.

Retrieving data from the Ergast API

If you’re completely new to doing these types of things in Python, I have some good news for you: retrieving data from Ergast is really easy. And with really easy, I do mean really really easy. All you need is the following:

As you can see, we created a method called ergast_retrieve() . Ergast has many different endpoints (e.g. for championship standings, race results, qualifying results, et cetera), which I really recommend you to check out here! To get the data from a specific endpoint, all you have to do is specify the endpoint and its parameters as a parameter for this method. It will then create a request to the Ergast API and return a dictionary containing the results you want.

Collecting the data

So, we’re interested in championship standings per round. Ergast has an endpoint for that, which allows us to specify the season and the desired round. The endpoint looks like the following:

http://ergast.com/api/f1/{season}/{round}/driverStandings

If we’re interested in, let’s say, the driver standings after 5 rounds in the 2021 season, we can make a request to the following endpoint (click on it to check it out!):

http://ergast.com/api/f1/2021/5/driverStandings

We do, however, need data from all the rounds. So, we need to make the same request multiple times by looping through all the rounds. That will look as follows:

As you can see, we loop through all the rounds one by one. At the time of writing, round 18 of the 2021 World Championship has just been completed, so we’re interested in seeing the progress up until then.

I highly recommend you to familiarize yourself with the data that’s being returned from Ergast. You can, for example, run ergast_retrieve('current/1/driverStandings') and see what the returned data looks like. The chunk of code above gets the driver standings from the return, which is a driver-by-driver list with their championship position and their amount of points. Since we’re interested in their championship position, we loop through all the drivers and store their standing at that specific round in the variable current_round , which we then append to our final DataFrame called all_championship_standings . Again, I highly recommend you to play around with all the variables to get an idea of what is going on!

Preparing the data for plotting

For those familiar with doing data analysis, you’ll probably know that data almost always needs some transformations before it can be analyzed in the way you want it. Also in this case we need to do apply small transformation to the data before it can be plotted the way we want it.

Here, we’re melting the dataset based on the round. This will convert the dataset from wide to long, were multiple columns (in this case the round and the driver) will work as identifiers. The result of the melting looks like this:

Plotting the data

Now we have the data in the format we want it in, we can generate the final plot. I’ll put the entire code first, and explain what it does afterwards.

So, let me explain what’s going on!

  • [Line 1–8]: Set up a few basics, like defining the size of the plot and setting the title.
  • [Line 10–18]: This is a loop that loops through all the drivers. For every driver, a Seaborn ( sns ) lineplot is created. The x-value is the round, and the y-value is the position in the championship. The standing is called value because, if you look at the melted dataset, the column containing the standing is just called value . The color of the line (line 17) is generated from the fastf1 helper function team_color , which can provide us with us the color codes of all the teams.
  • [Line 20]: We invert the y-axis since otherwise the driver leading the championship would be at the bottom.
  • [Line 23–29]: We tweak the x- and y-axes to make sure that they display all the numbers of the rounds and championship positions, and we add labels to the axes.
  • [Line 32]: We disable the gridlines since they add no value to the plot and actually only create chaos.
  • [Line 35–62]: Here is some cool stuff happening. To make the plot a little bit nicer, I really wanted to have the driver’s name at the end of the line instead of having a legend with confusing colors. To achieve that, we start to loop through all the lines on line 36. We get the last coordinates of the line (which is the end of the line), and then place the text right after.

Let me, just like at the beginning of the article, show you the result of this code:

Our result!

Pretty neat, right?! Even though the plot looks relatively simple, it still requires some steps to get it in the exact way we like it.

In the near future, I will work out more examples of what you can do with data from the Ergast API. So, make sure to give me a follow on Medium and make sure to give me a follow on Twitter.

Thanks for your reading and feel free to share your thoughts!

Do you want to acquire new data science skills, and are you a Formula 1 fan? Then this publication is exactly for you! It will provide with clear and concise tutorials about how to analyze Formula 1 data.

Recommended from Medium

Spark vs Hadoop: Which is the Best Big Data Framework?

How Social Platforms Can Prevent Mental Illnesses Using Deep Learning

Scott Road is the worst, but what else is to be said about the SkyTrain and West Coast Express…

Survival Analysis: A Brief Introduction

The most widely used database in the world

How to Use Harvey Balls in PowerPoint [Harvey Balls Templates Included]

Get them before they churn…

3 Crucial Actions for Planning a Data Science Engagement

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jasper

Jasper

Writing tutorials about Data Analysis & Visualization through Formula 1 Examples

More from Medium

Visualizing Formula 1 Qualifying Battles Using Python, Seaborn and Pandas

F1Archive: A Python Library for Analsying F1 Data

A dashin’ plot: Beginner’s guide to Plotly & Dash

Visualise weather when moving to a new city. Colab + Seaborn.