The F1 Project — Data Viz

Building a race results visualizer for McLaren

Published in

Slant Projects

10 min readJun 6, 2014

Shortly after we started rolling with CityBits a client approached us with some design concepts that they wanted us to turn them into a dynamic app. The comps looked good, and the data was cool: F1 race results. So, we jumped on the project.

The Design

The first thing we received were assets:

A set of fairly fleshed-out PSDs.
Two XML data files for one racer.

The comps showed what the app should look like, and it was clear that there were already a lot of visualization solutions apparent in the design. However, what they all meant was unclear.

The first comp looked like an overall lap synopsis for the top ten drivers for a specific race

There were two interesting comps, one that looked like an overall lap synopsis for the top ten finalists in a given race, and another that looked like it was a graph that was supposed to track championship performance — but, the graph was all over the place and we didn’t really understand it at first glance.

The second comp looked cool, but we weren’t sure how to read it right away

The XML files that came through were really clear, very specific and well constructed — each one represented a were for a single driver. They were both for the same circuit, but different seasons. After a quick glance we knew that it would be possible to parse through these and pull most of the data we needed to build a synopsis for a single driver’s performance on the circuit.

After receiving the designs and data files we had a lot of questions, so we hopped on a call and got some answers.

Single-Driver Synopsis

There were four main points we needed clarified for the driver synopsis. First of all, it was pretty clear that each line represented the driver’s overall race with their name on the left and their finish time on the right — easy.

There were four important points to clarify for the driver synopsis

The questions we had were:

What do the letter-number combinations mean? They actually represent two separate things: tyre type and the number of laps for a given leg of the driver’s race.
What are the little white bars? Pit stops. Pretty clear after we asked, and their thicknesses represent the length of the stop.
What’s the little green bar? Two things: the driver’s fastest lap, and the overall fastest lap of the race (shown in purple).
What do the colored sections mean? Each one was a leg of the race, color coded based on design (not on data), with the first leg always in red and every subsequent lap in a lighter shade of blue.

Race Ending

We also had a few questions when looking at the whole set of drivers.

There were two main points, about the spaces between drivers and the strange fades

The questions we had were:

What do the fades mean? This was quite simple in the end: DNFs. The fade showed that the drivers in 10th, 11th and 12th positions did not finish the race.
Why were the gaps so big? The answer we got was that this represented the amount of time the other drivers completed after the winner crossed the finish line.

Championship Standings

The championship standings graph looked good, but we didn’t understand it at first glance. We had two primary questions:

1. If this is tracking championship points, then why do they go down? It was actually straightforward. The graph wasn’t representing the upwards collection of points. Rather, it was showing the performance of drivers compared to a single driver.

2. What does the gray line mean? With the first question answered, this was obvious. The gray line represented the baseline driver against which all other drivers were compared. Each driver receives points for a race (if they finish in the top ten), and the graph represents the difference between the amount of points the baseline driver received as compared to the other winners. The spark line for anyone who received more points (i.e. placed better) would go up, otherwise the line would go down (i.e. if the driver placed better). Overall, this shows the baseline driver’s performance compared to the rest of the pack.

On to Data

With our questions out of the way we were able to move on to experimenting with the data and quickly realized that the driver synopsis designs had a flaw in them.

Skewed Time

After rendering out a few different sections of a race three things became absolutely clear.

You can’t measure the last lap of the race against in comparison to all the other laps of a driver’s race. Some of the drivers don’t even finish which means that they might have a lap that doesn’t have any time data, so it gets tricky to do any kind of generalized mapping of time for their laps. It’s easy to map the winner because we know that they finished in X amount of time, but for the rest of the pack the final time often is unclear, especially if the winner laps EVERYONE.
If we measure each driver’s synopsis against their entire race, then there is a clear distortion between each individual driver v. the winner. In the original comps, the first driver finishes at about 80% of the way across the bar — with the end of their bar representing the end of the last lap of the entire race. This implies that the winner’s bar represents an average of 71 laps in X amount of time. When we take this as the baseline, and we don’t know the remaining time for other drivers, then the actual timing of their races against the winner’s race will never line up appropriately.
If the first driver’s bar ends at 80%, then the remaining 20% space ends up representing up approximately 1/4 of the total laps of the race. Think about it this way: if the first 20% of the bar represents 1/4 of the winner’s race (~18 laps), then the last 20% of the bars for all other drivers visually represent the same amount of volume indicating that they raced tons of extra laps.

A Data-based Redesign

It’s really difficult to conceptually design the graphics for a data visualization without being able to see the data. At best, a designer has to guess what the data might look like and their assumuptions often end up usually being totally wrong. In order to get a good sense of how things will look, and what the visual language must be for a design, you need to work with good data.

We solved the design problems listed above by standardizing the graph in the following way:

The first driver represents the fastest total time for the race, we use this as the baseline for measuring out all other racers. We mark the end of the winner’s race as the 80% mark of the bar.
The remaining 20% of the bar, for each other driver, represents the amount of time between the winner’s finishing time and the last finisher’s finishing time. This is not to be confused with the last driver on the list.

From these two decisions we get a consistent way of measuring the performances of all the drivers in comparison to the winner, and we get a decent way of looking at the last laps of the top eleven drivers after the winner.

This image shows how the segments had to be redesigned to consistently fit the data and compensate for the distortion of the last lap of the race

Looking at the final version of the design we see a couple things:

There is a consistent line across all drivers at the 80% mark.
In the example above the last 20% of each bar represents 88.610 seconds. This is the amount of time it took the last driver, in eleventh place, to finish the race. The extra time for all drivers after the winner are measured against the last finisher.
We know when each lap starts, so if the fastest lap is a driver’s last lap, we can start it before the end of the race and allow it to continue until the end of that drivers bar (i.e. as above, the ninth driver’s last lap was the fastest of the entire race).

The Actual Data

For any given race we had to parse through a lot of data to synthesize everything, and the process of measuring and comparing between all drivers was a bit of a task. But, in the end it turned out to be quite reliable.

Each driver’s data came in three parts: laps, pitstops, results. We used the laps and pitstops data to build the race synopsis for a driver, and the results data to sort them (also for the championship graph).

This means a lot of data files. For example, the Malaysia race had 22 starting drivers which gave us 66 total files for that race. On the image above we have 7 races which came out to approximately 450 files, but we had already pre-rendered data for the beginning of the 2013 season as well as all of 2012. So, altogether we were dealing with about 1900 XML files — because that’s how the data came.

We standardized things to make the parsing simple. For instance, file names looked like: year_circuitnumber_drivername_datatype.xml which translated into something like 2012_1_alonso_pitstops, etc. With the proper folder structure it was fairly straightforward to build an xml parser that could gather all the necessary data for a driver, render collate and render an image, then spit it out with the right file name.

After rendering out all the individual components as images, we simply tossed them into the app and sorted them out there.

Championship Graph

The championship graph didn’t take much effort. We just iterated the circuit-by-circuit results to build up the data for an individual’s overall standings. The app was built from the perspective of McLaren, so Webber’s performance became the baseline for the graph.

Webber’s performance is the straight grey line

This is fairly straightforward to read. With Webber’s performance being the baseline, if he places well then the majority of the other drivers’ sparklines will go down, otherwise they go up. Oh yeah, all the data up to the white line was real, anything past it was supposed to represent prediction.

The Deliverable

Total production time was about forty hours of work, over the course of about twenty-five days — there was some down-time as questions needed to get answered, designs needed to be updated, and just the general pace of a project. The majority of the work was in creating the initial visualizations where we had to compare real data against the designs, as well as in writing some scripts to download, sort and rename the data files. We also spent a lot of time initially rendering out the data from 2012, after which we were able to put together the rest of the app, and do some basic updates.

All in all, the budget was small enough to warrant a quick and dirty execution of the app. This wasn’t one for the App Store, so we weren’t worried about all the t’s and i’s, rather we focused on making sure the app was light, responsive and animated. So, instead of having the data be handled directly by the app we had built a few extra apps with C4 that we used to parse and render all the images for races, results and so on.

The apps we built were:

DataDownloader (a simple app for grabbing xml from the web)
RaceParser (a C4 app for rendering out driver results for each circuit)
ChampionshipResults (a C4 app for rendering the championship graph)
Insights (the deliverable, a C4 app with three interactive views that allow users to explore the most current races for the 2013 season, for iPad)

Reflection

This was a good small development project, that came with some tight constraints and a lot of great energy. We like to sink our teeth into these kinds of short timelines because they stir things up, get us ultra-focused on the task and at the end of the day they come out looking great.

There was one hitch though, that ended up being pretty stressful. A rush deadline popped up at the end when we were updating the app and our client couldn’t get it installed on his device in time for a huge presentation to hundreds of people. In the end, the client was happy with the work and understood that it wasn’t our fault, there were timezones, other impending deadlines, poor internet, delays in response and they were traveling at high speeds around the world.

We would have loved for this to go off without a hitch, but we learned a lot from handling it and continue to do a lot of work with the client.