Contributor Submission

What I Learned Pulling 100+ Tables Of F1 Data — And Where I Went Wrong

Using Python and SQLite, Pipeline contributor James scraped, stored and analyzed 100+ F1 tables — but later discovered a flaw in his approach and learned a big lesson.

--

James Patalan is pursuing an M.S. in data science at Bellevue University and is a contributor to Pipeline: Your Data Engineering Resource. Inspired by data he scraped for a school project, James examined Formula One (F1) data in an attempt to answer a question that F1 enthusiasts have debated since the sport’s inception. Along the way, the process and outcome challenged his previous assumptions and taught him an unexpected lesson.

100+ URLs And 2.2 Billion Dollars

For those unfamiliar with the thrill and drama of a Formula One (F1) season, there are two competitions that happen at once, the Drivers and the Constructors. The Drivers’ championship is self-explanatory, drivers individually earn points based off of their performance in each race. The better they perform, the more they can negotiate for the next time their contract is up.

On the other hand, the Constructors’ championship measures the performance of each of the teams, those who “construct” the car. The Constructors’ championship is the primary source of income for teams, with the top team earning 14% of the roughly $2.2bn prize pot. Since points are awarded for all those who finish in the top ten, even a non-podium finish can make a huge difference for drivers and teams.

Red Bull Formula One car.
Photo by Ahmed.sellami91 Sellami on Unsplash

Lewis Hamilton is currently the record holder for the most wins in F1 history at 103, and apart from the 2016 season where he came in second place, Hamilton reigned as world champion from 2014 to 2020. Can Hamilton definitely be called the most skilled driver? Maybe he is, or perhaps his success came more from the superior Mercedes technology that he was behind the wheel of.

In 2014, the governing body of F1, the FIA, implemented a rule change that mandated the use of turbo-hybrid engines, a change that greatly benefited the Mercedes team. Then after the hotly contested 2021 season, the FIA implemented more car regulation changes in 2022, and in a single season Hamilton went from being the best driver in the world, to the second-best driver on Mercedes.

So how do we measure the skill of an F1 driver? Using data that I created by programmatically generating over 100 unique URLs, and doing far more cleaning than I anticipated, I hope to answer this question. The thrill of racing comes from watching a driver battle their way to the front by overtaking those ahead of them. Since each driver is racing in a different car, if we can find out which driver has improved their position the most over the course of the past five seasons, then they can be declared the most skilled driver right? Not exactly.

Bar graph showing the average positions gained or lost by each team
Graph by James Patalan

Individual Performance

From what the data shows, the team that on average improved their position the most is Williams. The Williams team is infamous for starting at the back of the grid every single race, so even if they don’t overtake anyone, a single driver retiring means they have gained a position by simply finishing the race. The next team with the most position improvements is to my surprise Aston Martin. This is because Aston Martin is home to Lance Stroll, one of the most meme-able drivers, who many consider to have bought his way in rather than have any talent.

The two teams that lost their position on average the most are unfortunately both of my favorite teams, Ferrari and Haas. This could be attributed to both teams’ tendency to start at the top of the grid then retire from the race. The current dominant team, Red Bull, is shown here to lose 0.5 positions in an average race.

Bar graph showing the amount of times Max Verstappen has finished in each place
Graph by James Patalan

Before I discuss the inherent flaw in my experiment design, let’s break down the stats of the current champion Max Verstappen and the aforementioned Lewis Hamilton. Not surprisingly, the positive skew shows that Verstappen finishes 1st the most.

Bar graph showing the amount of times Lewis Hamilton has finished in each place
Graph by James Patalan

Hamilton’s graph is representative of his status as the driver with the greatest number of victories, finishing 1st nearly twice as many times as he does in 2nd. Hamilton has an average finishing position of 3.4 or around 3rd place.

Now that the two champions have been covered, let’s take a look at some of the up and comers on the F1 grid, starting with Hamilton’s teammate, George Russel.

Bar graph showing the amount of times George Russel has finished in each place
Graph by James Patalan

Russel previously raced for Williams before being offered a position with Mercedes in 2022. After changing teams, Russel finished in the Top 5 in all but three of his races during his debut Mercedes year, and has been dubbed “Mr. Consistency” by fans. It should be noted that for the purposes of this experiment a retirement from a race is counted as finishing in last place.

Bar graph showing the amount of times Lando Norris has finished in each place
Graph by James Patalan

Fan favorite Lando Norris has yet to win an F1 race, but is routinely in the front of the pack, with only a few outliers outside of race retirements.

Bar graph showing the amount of times Charles Leclerc has finished in each place
Graph by James Patalan

My personal favorite driver, Leclerc, has retired from sixteen races over his career, which unfortunately makes it his most consistent position on the grid. From engine failures, to crashes, to “Ferrari Strategy”, it has been questioned if poor Leclerc is the unluckiest driver in F1.

The Flaw in My Logic

Going back to my original design, let’s find out which driver has the most position gains over the course of all their race history.

Graph showing the average number of positions gained or lost for each driver over the past 5 seasons
Graph by James Patalan

Looking at this very cluttered chart of every driver in the past five seasons, a driver I have never even heard of, Stoffel Vandoorne, is the standout driver who on average improved his position the most. Since Vandoorne’s last year racing was 2018, there is only one year of his racing included in my data. So Vandoorne must have had a particularly good year right? Not exactly, he never even made it on the podium for a race, scored only 12 points, and came in 16th place overall that year.

This is very revealing of the flaw in my experiment design. By my original logic, I should be proclaiming Stoffel Vandoorne the most skilled driver of the past 5 years since he has improved his position the most, this however, is not something I am going to do. Winning the Drivers’ Championship in F1 is about consistency, not wild gains/losses. A driver can improve their position by five places, and still finish 15th.

What I Learned

Although I did not answer what I first set out to accomplish, this project has been a great learning experience for me, I learned first to be more diligent in accounting for variables and context when designing a hypothesis.

When working with data, it is always imperative that you let the data reveal insights, rather than trying to make something out of trends that don’t exist.

I also learned to ensure I acquire proper domain knowledge before dedicating time and resources to a project. In hindsight I may have initially bitten off more than I could chew, it seemed like nearly every corner I rounded on this project resulted in a new error to solve and messy formatting that I needed to clean.

With that being said, sometimes sink or swim is the best way to learn, and if you are interested in my process, stay tuned for a technical write-up including how I sourced my data, stored my results and created my visualizations.

Doing these projects is important to refine my skills and practice on real world datasets as I approach graduation in the coming year. I am interested in learning about data engineering from working professionals and open to informational interviews and work opportunities.

Contact James:

--

--