Formula 1 Data Analysis Tutorial — 2021 Russian GP: “To Box, or Not to Box?”

Towards Formula 1 Analysis
7 min readOct 1, 2021


As a data-fanatic and a Formula 1-fan, the amount of data coming from Formula 1 weekends is simply amazing to play around with. How cool is it to create insights that you even haven’t seen on TV during the weekend?!

This series is providing you with tutorials (check out the Dutch GP and the Italian GP tutorials) on how to analyze Formula 1 data using Python, while simultaneously providing in-depth insights into some specific events from race weekends!

If you want to know how to create a plot like the following, read on!

The 2021 Russian Grand Prix

Wow... Again an amazing weekend. Saturday was already great with the track drying up just in time for Q3, giving Lando Norris his maiden pole position. And then there was Sunday. The race was really tense with Norris leading the race and Hamilton getting closer and closer. And then… rain.

So much happened during the last few laps, and I really cannot begin to imagine all the dilemmas the teams and drivers were facing. With only five laps to go, what do you do?! Do you do you choose for inters, or do you take the gamble and stick with the slicks? In other words…

“To box, or not to box?”

Let’s see which tyre performed better at which part of the lap during the closing stages of the 2021 Russian Grand Prix.

Step 1: Set up the basics

First of all, we load all the packages that are required for this analysis.

After that, we enable the plotting functionality, enable the cache and change a small setting.

Step 2: Collect the data

Then we load the session data (2021, Russian GP, Race), and we tell the Fastf1 Python library to start collecting all data. Since we will use telemetry data, we set with_telemetry to True .

Expanding the data
Now that all the data has been loaded by the Fastf1 package, we will have to make a few modifications to the data and apply a few transformations to get the data in the format we’re looking for.

First of all, we create a variable RaceLapNumber , and we select all laps starting from the moment the rain started falling in lap 45 onwards.

Now, things are getting a bit more complicated. To create a comparison per lap, we need lap-by-lap telemetry. Fastf1 only allows us to retrieve telemetry per driver, so we need to do multiple loops to get the data in the format we want.

As you can see, we first loop through all the drivers (line 7), and then select the laps that belong to that driver (line 8). To be able to get lap-by-lap comparison of the telemetry, we then loop through all the laps of that driver using the built-in iterlaps() functionality by Fastf1 (line 11), which is basically similar to Pandas’ iterrows() .

Then we have single lap for a single driver stored in the variable lap, meaning that we can now load the telemetry and the lap distance (line 12). And finally, we include some context (driver, lap number, compound) that does not automatically come with the telemetry data (line 12–15). I strongly advice you to play around with this data yourself to get a feeling of what we’re exactly doing (and more importantly: why!).

Now we have all the telemetry data, we make two small modifications to it: we only select the columns we need, and we convert all “Hard”, “Medium” and “Soft” compounds to “SLICK”.

Create the mini-sectors and calculate fastest compounds
So, the data is in the right format now for us to start looking at which tire was fastest at which point during the lap. The methodology to do so will be as follows:

  1. We split the lap into 25 equally-sized mini-sectors (lap distance / 25)
  2. We assign every row in the telemetry data with the mini-sector it currently is in (based on the lap distance)
  3. We group by lap, mini-sector and compound and calculate the average speed so we can see which tyre was faster at what point during the lap

Starting with step 1, we create the mini-sectors.

We then assign the current mini-sector to every row in the telemetry data (step 2).

As you can see, this is quite complex. What happens here is that we create a column Minisector in the telemetry DataFrame, which runs a calculation based on what is stored in the column Distance . Based on the distance, we can see to what index in the minisector list the distance actually belongs. The index number is the mini-sector number. Since it is really hard to explain what is going on, I suggest that you play around with the data to see what this piece of code is actually doing.

Now that we know which telemetry datapoint was recorded in which mini-sector, we can now calculate the average speed per mini-sector (step 3). We group by lap, mini-sector and compound, and then calculate the average speed.

This code will result in the following DataFrame, showing the average speed for every lap, mini-sector and compound combined.

Using this data, we select the fastest compound per mini-sector per lap. The idmax() method gives us the ID of the row with the highest speed per lap per minisector.

To finish this bit of code, we have to do three more things: merge the telemetry data with the fastest compound per sector, order the telemetry data by distance to avoid the plot from getting messed up, and assign an integer value to the tire compound since Matplotlib can only handle interger values in this case.

And now we’re really done with collecting and transforming all the data. Quite complex, right? That’s why I keep stressing again and again that you should play around with the data. This is the only way you will really understand what is going on.

Step 3: Plot the data

Now we can finally start plotting the data. The way this plot will be generated is inspired by the following tutorial from the Fastf1 Python library: “Gear shifts on track”.

Since we want to generate separate plots per lap, we will put the code into a method to avoid having to repeat ourselves.

As you might have noticed, this is not your everyday plot where you simply provide to variables and you get a nice lines or bars. No, this is a plot that draws a line in the shape of the circuit. Let me show explain what’s happening step-by-step:

  1. [Line 2] Since we’re creating a plot for one lap, we get the telemetry of that specific lap
  2. [Line 4–5] The telemetry DataFrame contains an X and Y value for every row, which tells us where exactly on the track the car was at that specific moment in time. We want to collect all those X and Y values to draw the circuit in the plot.
  3. [Line 7–9] We combine the X and the Y coordinates together so that they become points, which will then all form segments over the course of the lap. Lastly, we convert the compound variable to a numpy variable.
  4. [Line 11–14] cmap means ColorMap, which basically just defines the colors of our plot. I selected the color scheme ‘ocean’, which I found really suitable for this plot since it ranges from white (Hard tires) to green (Intermediate tires). After that, we create a LineCollection, which basically combines all the previously created segments into a line. This forms the shape of the circuit!
  5. [Line 16] Define the size of the plot, making it a bit bigger.
  6. [Line 18–21] Define the title of the plot (which can be disabled since I used this plot for social posts where I wanted to hide the title).
  7. [Line 23–25] Add the LineCollection (in other words: the shape of the circuit) to the plot and disable all labels.
  8. [Line 27–30] Add a legend to the plot, which is a colorbar that tells us which color belongs to which compound
  9. [Line 32–33] Save the plot

Aaaaand we’re done! Now we can call the method to generate a plot of the lap for us.

I did this for every lap and edited the different plots into a single GIF as you could find at the beginning of this article using


So… “To box, or not to box?” remains a very difficult question. The intermediate tyre immediately appeared to be much faster, except for some parts of the track. However, the main question all the teams were struggling with was: what is faster? Doing 3–4 slow laps on the slick tyre, or losing > 20 seconds while boxing, but then having the fastest tyre? In the end, boxing appeared to be the fastest solution (cries in McLaren……..).

All in all, interesting to see how the track and tyre performance was affected bit-by-bit by the rain. I hope this tutorial was insightful! Thanks for reading again and let me know if you have any questions or discussion points ❤️



Towards Formula 1 Analysis

Writing tutorials about Data Analysis & Visualization through Formula 1 Examples