Finding the Limit: Formula 1 Data Visualizations and Points Prediction

It’s lights out and away we go…

Julian Terenzio
11 min readNov 30, 2021
Photo by Hanson Lu on Unsplash

For fans old and new, Netflix Original Series, Drive to Survive, seemed to have kindled a new generation of Formula 1 fandom across North America. Emotions run high watching drivers push a machine to its absolute limit knowing that milli-second reactions can determine who holds up a trophy or who makes it out of the car alive. I became interested in the extreme nature of the sport as I grew older, and I now admire its unique intersection of business, tech, and engineering. F1 teams compete at the margins: hitting speeds as high as 372 kph, taking pit stops in under two seconds, and whistling around corners at 5G’s of force. I thought it would be interesting to look deeper into these margins and see what factors affect performance, and ultimately if point finishes can be reliably predicted with machine learning (ML) models. Is this also a shot-in-the-dark project to possibly land an interview in the Formula 1 community? I don’t know — only time will tell, I guess.

(1) DATA COLLECTION

If you don’t understand a lick of Python, feel free to skip this part and dive into the story told with the visualizations. I’ll save you all some time by providing the full Python repo on my GitHub page, so please feel free to rummage through the code further, if you so choose.

General Race Data, Driver Results, and Constructor Results Using Kaggle

Kaggle provides a good starting point in collecting general race data, driver results, and constructor results. The Kaggle data was initially collected from Ergast API Developer, a standard resource for retrieving raw F1 data.

Weather and Qualifying Data Web-Scraping Using BeautifulSoup

The data collection process got interesting when I had to collect weather data. Using the Wikipedia page for each race, I used the BeautifulSoup Python package to scrape multi-language Wikipedia pages for every race and appended the links to the current English-language pages. Having Wikipedia pages in a few different languages will act as a backstop in the event that no weather data is found on the English-language page. If no weather data is found at all, I will assume the weather on a given race day is normal (i.e. dry/warm weather).

The final step in the data collection process is sourcing reliable qualifying data. The official Formula One website provides the most reliable and comprehensive qualifying data since 1983. Since qualifying formats have changed over the years including single-lap qualifying, 12-lap qualifying, and knockout qualifying, I decided to simply take the fastest qualifying lap produced by each driver during a given session.

Data Cleaning & Feature Engineering

Finally, I manipulated the data and calculated a few new features.

  • Manipulated the finishing time and qualifying time of each driver for each race to reflect the interval difference from the fastest driver in each feature;
  • Manipulated a driver’s finishing status into one of four concise types of status: “finished,” “lapped,” “mechanical_issue,” and “accident;”
  • Created two new features: the 3-race rolling average finishing time and 3-race rolling average qualifying time for each driver in any given race;
  • Calculated the drivers’ relative age at each race;
  • Calculated the cumulative finishing ratio, the cumulative lapped ratio, and the cumulative accident ratio for every driver in any given race.

Now, let’s get to the fun stuff.

(2) EXPLORATORY DATA ANALYSIS

So how do we start unpacking the data? I decided to take a first glance at the correlation between the dataset’s numerical features. An interesting relationship found is that a driver’s age (i.e. experience) is positively correlated with the likelihood of finishing a race without being lapped. Take a look.

For the past decade, 10 teams and 20 drivers have competed for the Constructors and Drivers Championship. Ferrari, Renault, Williams, and McLaren carry a weight of legacy success in F1 since the 1970s. Although, it’s well-known that a driver and their car are only as good as their last race. The success of a Formula 1 team seems to competitively ebb and flow every year because the margin for error in building a competitive F1 car, successfully racing the car, and managing a 200+ person team is extremely high.

Since 2010, the points system allocates the winner of every race 25 points toward the World Championship, followed by 18 points for second place, 15 points for third place, and down to one point for coming in tenth. In competitive seasons, determining the World Champion often comes down to the wire in the final throes of the last race in Abu Dhabi. While Lewis Hamilton has largely dominated the grid over the last 5 years, I found that the range of points attained in the top 10 drivers at the end of each season has fluctuated every year, with 2012 being the most competitive season to date.

While Lewis Hamilton is rightfully one of the greatest F1 drivers of all time, only one other driver has succeeded in beating Lewis to the World Championship in the same car — Nico Rosberg in 2016. Hamilton and Rosberg battled with a Mercedes engine for 4 years in what is now known as “The Silver War.” It’s fascinating to see that even some of the best drivers of all time paired with Lewis Hamilton simply couldn’t match his success.

Racing wheel-to-wheel inevitably comes at a cost. The families of the 52 drivers that lost their lives in Formula 1 finding the limit in themselves and in their cars know all too well. Fear does not seem to be in a driver’s vocabulary, yet the risk of death patiently looms over the track each Sunday. The ability to successfully race a car at breakneck speeds in any weather condition while managing the risk of a fatal mistake is simply something to admire.

The following visualizations point to the extreme nature of the sport. I defined a racing “accident” as any collision, collision damage, driver disqualification, injury, or crash (excluding punctures) that resulted in the driver retiring from the race. All other mechanical issues, DNFs, or punctures are labelled as “mechanical issues.” As Formula cars improve year over year, I found that mechanical issues are not as prevalent anymore — and thus a higher standard of reliability is always at play.

Many drivers have crashed and walked out of the car alive after experiencing over 67G’s of force in some cases³. Some race tracks are evidently more dangerous than others including Monaco’s narrow streets and Japan’s often slippery conditions. While the safety of Formula 1 racing has dramatically improved over the decades, fans and families will never forget the era of the 1960s where drivers faced death at every corner and were “crushed, burned, and beheaded with unnerving regularity.¹” In the following visualization, the red squares signify race tracks that are more dangerous.

Drivers must compete in various weather conditions as well. Wet and rainy conditions call for intermediate/wet tyres which typically accentuate a driver’s skill in controlling a race car. I have manipulated the finishing time of each driver for each race to reflect the interval difference between the fastest race winner and every other driver as presented below.

I then engineered each driver’s average finishing time per season (in seconds) over every season they have driven. I separated this feature into the average finishing time per season in both wet conditions as well as dry conditions. I finally calculated each driver’s “rain control” finishing time by subtracting the average wet/rainy finishing time from the average dry/warm finishing time for each driver on the 2020 grid. The result becomes a proxy for how strong a driver can control the car in wet conditions relative to other drivers (feature referred to as “rain control”). This is a strong proxy because a higher average “rain control” signifies that the driver is relatively better at racing in the rain than other drivers on the grid since their average finishing time in wet conditions is relatively lower than their average finishing time in dry conditions.

The results show that Lance Stroll is the driver with the best “rain control” on the 2020 grid. They do say that rainy conditions are the great equalizer in extreme motorsport as everything you may know about your car, your rivals, and the circuit becomes completely irrelevant. Coincidentally (or expectedly), Stroll clinched his first-ever Formula 1 pole position in qualifying on a rainy day at the 2020 Turkish Grand Prix. And in 2019, Stroll just missed out on a podium finish on a rainy day at the Hockenheim Grand Prix, stating, “I’d like to see some more wet races in the near future…the rain spiced things up.” This point leads me to believe that Lance will one day be a very successful driver — supported by Aston Martin’s powerful coffers.

For posterity’s sake, I chose to include Ayrton Senna’s calculated “rain control” score which seems to cast all other drivers as mere mortals in controlling a race car in the rain. Senna completed what is now dubbed the “Lap of the Gods” at the outset of the rainy 1993 European Grand Prix, and he became a legend known for masterclass racing especially during chaotic, unsafe, and heavy rain conditions. Senna’s life came to a tragic end following his fatal impact at the Tamburello corner in May of 1994, yet he remains a hero that inspired the hearts and minds of many inside and outside of Formula 1.

A powerful and inspiring documentary on the life and death of one of the greatest Formula 1 drivers of all time, Ayrton Senna.

I then created a correlation heatmap to see if this “rain control” feature correlated to a driver’s age, qualifying grid position, and finishing position. Interestingly enough, the data shows that there seems to be a strong positive correlation between “rain control” and a driver’s age, suggesting that a lengthier career in F1 could signify a stronger ability to control a race car in the rain. The heatmap also suggests that “rain control” positively correlates with achieving a better qualifying grid position.

It’s safe to say that pushing a car to its limit in wet conditions could be fatally dangerous. From 2011 to 2020, the data below shows that accidents do happen more often during wet/rainy track conditions when compared to dry/warm track conditions.

(3) PREDICTIVE ML MODELLING

It’s prediction time. For those of you that have little interest in the technicals of machine learning, just note that the Support Vector Machines model produced the strongest accuracy rating of ~77.6% in predicting if a driver will finish in the top 10 (i.e. points finish). More steps regarding preprocessing and model-fitting operations can be found on GitHub. However, the beauty in Formula 1 stems from its unpredictability. Fans will never forget when Sergio Perez rebounded from 18th to take his first win at the 2020 Sakhir Grand Prix, nor will fans forget Pierre Gasly clinching his first unexpected victory at the 2020 Italian Grand Prix — just to name a few.

Support Vector Machines

Model-fitting and prediction using Support Vector Machines with hyper-tuned parameters determined by GridSearchCV.
SVM predictions for the 2020 Austrian GP. Note that “prob_1” signifies the probability of placing in the top 10, while “prob_0” signifies the probability of not of placing in the top 10.
Summary of ML model prediction accuracy, precision, recall, and best parameters.

I distinctly remember the roaring engines coming from the family room every other Sunday in my childhood home. As a kid, I found it easy to sink into the rhythmic twists and turns of each race yet quickly lose attention. But in Formula 1, there is a beautiful struggle at every corner that I did not fully appreciate until quite recently. On one hand, drivers try to find the limit to be as little as one-hundredth of a second faster than the next best driver. Pushing an F1 car to its limit is mentally and physically taxing due to the incredibly high g-forces a driver experiences. A driver will typically lose more than 6 pounds over the course of a race. Kevin Magnussen made the following comment about the scorching temperatures inside the cockpit of an F1 car following the Singapore Grand Prix:

“I just accepted that I might blackout at some point,” he added. “You just try your best. You don’t know if you are going to blackout, so there is no point giving up.⁴”

On the other hand, you have engineers tirelessly building, maintaining, and optimizing what could be deemed a rocket ship on four wheels. Race strategists are ruthlessly leveraging real-time data to determine, say, the optimal strategy that will allow teams to undercut each other during pit stops just to be as little as one-hundredth of a second faster than the next best team. Success and failure are determined at the margins.

“With help from over 300 sensors on each car, McLaren’s F1 electronic control unit (ECU) deals with over 1000 input parameters and transmits more than 1.5GB of live data back to the garage during an average 300km grand prix. During a two-hour race, the ECU will receive and send over 750 million data points. That’s twice as many words as each of us will speak in a lifetime.²”

Formula 1 racing is an art just as much as it is a sport — you simply have to look under the hood to notice…

Photo by Kévin et Laurianne Langlais on Unsplash

If you have any questions regarding the article’s content (or you’re a hiring manager in the F1 community 🤞), please feel free to reach out to me on LinkedIn, and learn more about me here.

[1]: Cannell, M. (2012). The Limit Life and death on the 1961 Grand Prix Circuit. Thorndike Press.

[2]: The Brain of an F1 Car. (2018, February 7). McLaren Applied. https://www.mclaren.com/applied/blog/brain-of-an-f1-car-mclaren-ecu/

[3]: Associated Press. (2021, March 6). Report reveals details of impact, fire during Romain Grosjean’s wicked crash at Bahrain. MotorSportsTalk | NBC Sports. https://motorsports.nbcsports.com/2021/03/06/romain-grosjean-crash-report-f1-bahrain/

[4]: Edmondson, L. (2019, September 21). The physical challenge behind F1 drivers’ “love-hate” relationship with the Singapore GP. ESPN.Com. https://www.espn.com/f1/story/_/id/27659392/the-physical-challenge-f1-drivers-love-hate-relationship-singapore-gp

--

--

Julian Terenzio

Interested in fintech and product development. Trying to inspire and find ways to build a better future.