Analyzing Formula 1 Data Using Python: 2021 Abu Dhabi GP Minisector Comparison

Published in

Towards Formula 1 Analysis

6 min readDec 15, 2021

With the 2021 Formula 1 season over, why not spend the off-season learning to analyze Formula 1 data yourself? This tutorial will show you how to create graphs like these:

This tutorial will be a spin-off from a previous tutorial I did regarding the creation of minisector plots, where the focus was on comparing dry weather tires versus wet weather tires. You can check it out here:

Formula 1 Data Analysis Tutorial — 2021 Russian GP: “To Box, or Not to Box?”

As a data-fanatic and a Formula 1-fan, the amount of data coming from Formula 1 weekends is simply amazing to play…

medium.com

Now, let’s get started!

2021 Abu Dhabi GP

For this tutorial we will dive into the 2021 Abu Dhabi GP. Specifically, we will analyze the qualification session: of the year’s most spectacular ones! Let’s get started.

Max Verstappen and Lewis Hamilton have been going neck-and-neck all season, and nothing was different during the qualification of the Abu Dhabi GP. Eventually, in contrast with what the earlier practice sessions made us believe, Verstappen beat Hamilton to pole position with 0,371 seconds. So, let’s see how you can analyze this lap and find the difference.

If you have never done anything in Python before, make sure to check out my tutorial for absolute beginners right here:

How to Analyze Formula 1 Data with Python: A Beginner’s Tutorial

You want to analyze Formula 1 data, but you really don’t know how to get started? Then this guide is made exactly for…

medium.com

Step 1: Setting up the basics

We start with creating a Jupyter notebook, in which we import the libraries we need. If you don’t have any of these installed, make sure to pip install it in your command line, by for example doing pip install fastf1 .

Since, as you will notice later, it can take a few seconds to load the data from the qualification session, we want to cache that data so that we can quickly access it later. Also, we run a method coming from the fastf1 library that sets up the plotting we will do in matplotlib.

Step 2: Collecting the data

Now it’s time to collect the data. First, we select our session using ff1.get_session(year, race, session) . The session we are interested in is, in this case, Q for qualification. If you’d like to get any other session, you can also choose FP1/FP2/FP3, or R. After specifying the session, we can load the laps.

Step 2.1: Processing the data

In the world of data, a dataset is (unfortunately) almost never ready out-of-the-box. It always needs some data transformations to enrich the data or to re-format the data into the shape we want it in. So, that’s what we are going to do now.

Do, however, make sure to inspect all the DataFrames yourself so that you can understand what is going on every step. In order to fully wrap your head around what’s going on, keep on playing with the data yourself as well!

Expanding the data
We start with selecting the specific data we require, and adding to it what we need. This involves selecting the fastest laps from Hamilton and Verstappen, getting their telemetry data, adding the distance variable so we can compare the speeds over the distance, and merging both telemetry DataFrames into a single DataFrame called telemetry .

Creating minisectors
Since we’re comparing the speeds across minisectors, we first need to define the minisectors. We do this by cutting the total length of the track into 25 (this number can be changed) equally-sized chunks, which will form the minisectors.

Now we know length of each minisector, we want to create a list that contains all the distances at which the next minisector starts.

And finally, we want to assign every row that exists in the dataset a minisector. In other words, in which minisector was the car at the moment the datapoint was recorded? Do do that, we create a column Minisector in the telemetry DataFrame, which looks the Distance up in the minisectors variable we just created, to identify in which minisector we are. I suggest you play around with this to fully understand what’s happening!

Calculate fastest driver per minisector
The last thing we need to do is calculating the fastest driver per minisector, so we can determine the color of the plot for that minisector. This, again, involves a few transformations to the data that might be hard to follow along (like groupby() ), so again, make sure to play around to fully understand what’s happening.

First, we calculate the average speed per driver per minisector, by grouping by Minisector and Driver , and taking the mean from Speed .

This gives us a DataFrame that looks like the following:

Then, we select the driver with the highest average speed by using idmax() , which returns the index of the row with the highest value, and then we select only the columns we need.

This results in the following:

The only remaining thing we need to do is to join the fastest driver per minisector with the full telemetry data, so that we can easily plot all the data later. In addition, we need to make sure that the Distance is properly sorted. Last, but not least, we need to convert the driver abbreviation (e.g. “HAM”) to an integer value, since otherwise matplotlib won’t be able to deal with it.

Step 3: Plotting the data

And, now it’s time to finally plot the data. I’ll put the full code below, and then I’ll explain line-by-line what’s happening.

Let’s run through everything that’s happening here:

[Line 1–2] We extract the x and y value from every row, which tells us where exactly the car was on the track at that point in time. We will use this to draw the shape of the circuit.
[Line 4–6] The x- and y-coordinates are combined so that they become points. These points are then used to form segments between those points.
[Line 8–11] This is all about the coloring of the plot. cmap stands for ColorMap, which basically defines the colors of the plot. I chose for the ‘winter’ color scheme, since that one gives us something like the Red Bull vs. Mercedes colors. Via this link, however, you can find many other colormaps. The previously defined segments will then take a color from the Colormap, and form a LineCollection , which is, yeah, a collection of all the lines.
[Line 13] Making the plot a bit bigger
[Line 15–17] This is where the plot gets drawn in the shape of the circuit. The LineCollection is added to the plot, and the labels are disabled on the x- and y-axes.
[Line 19–21] This part adds the legend, which is the colorbar you will see on the right.

That’s it! The final result will look like the following: