Data Vis Sandbox #2: Track Pursuits

Matthew Montesano
8 min readOct 18, 2017

--

A while back I wrote about track cycling’s potential to have great data visualizations that really help tell the story of what’s happening in a head-to-head race. I’ve also published a data vis sandbox as I play around with simple techniques for improving how I communicate data in my day job.

Recently, the Master’s Track Cycling World Championships were in LA — and many people I’ve trained with and raced against went there to compete against the world’s best amateurs over age 35. Since some of the sessions were livestreamed, I got to watch from afar — and collect some data.

Bike races are beautiful stories — and I wanted to see if I could use data to tell those stories.

The individual pursuit

In the men’s age 50–54 2 kilometer individual pursuit, two riders faced off for the gold medal: Daniel Casper and Jesper Offersgaard. They start at opposite sides of the 250-meter velodrome, and race 2 kilometers to set the fastest time.

https://www.youtube.com/watch?v=r_7tAGKinjg&index=5&list=PLijx1KWvZJYI_kwnQ_Kl7h1wNsydPUY1x

Two timing strips are laid down on the track, so every 125 meters, the riders’ time is taken and displayed. We see each rider’s cumulative time for the distance, and their lead or deficit to the other rider.

Find the story

I harp on this in my day job: the purpose of visualizing data is to help you tell a story. As I was watching this race, Offersgaard jumped to an early lead, getting a 2.8 second advantage at the halfway point. And, in the final kilometer, I watched Casper — a friendly rival of mine and an athlete I admire — slowly reduce Offersgaard’s lead until, in the final half a lap, he flipped his deficit into an advantage and won the gold medal.

What a great race! I wanted to see what the race looked like by understanding how the lead and the size of the lead changed throughout the race. Point data are nice but longitudinal data are better:

First, I graphed the time differences — and saw a beautiful-looking bell curve as Offersgaard steadily built up a big lead in the first four laps, up from .593 seconds in the first half lap to 2.833 seconds by the end of the first kilometer. And in the data, I saw that Offersgaard’s lead steadily decrease until Casper took the lead at the last possible moment, in the final half lap, crossing the finish line .652 seconds ahead of his opponent.

Because I was rooting for Casper, I charted this with Offersgaard’s lead in red and Casper’s lead in green. It’s simple stoplight symbolism: red for danger, green for success, and apologies to colorblind people because I haven’t run this through a cb-check.

But this chart made me ask some new questions — did Casper speed up throughout his pursuit, or did Offersgaard slow down? Both? By how much?

My first instinct was to graph their average speed at each half-lap interval:

After sharing this on Facebook, I realized some people were confused about my legend, and realized that I could just use colors to tie together the rider, their line, and their final average speed.

Once I saw these data, too, I realized that it wasn’t too helpful. Each rider’s speed essentially increases throughout the race, as they decrease the effect of the slow first lap, which, done from a standing start, makes their average speed less than the speed they try to maintain throughout the event.

So I went back to my spreadsheet. I had cumulative time, and it was easy to turn time into speed (knowing that the time was registered every 125 meters), but what I didn’t have was each rider’s “split”— the amount of time from one timing strip to the next. But it was easy to make a new column in my spreadsheet and have it make that calculation.

This graph of half-lap splits is basically another format of their speed for each half hap (if I charted speed, it would look exactly like this chart— but upside down, and with a different y-axis).

You can see the first data point is high — riders start from a standing stop, so the first half-lap takes longer to cover. And, you can see that Offersgaard starts out riding a really fast first half, getting his half-lap split as low as 7.842 seconds. But his half-lap split times start to creep up almost immediately. Meanwhile, Casper is riding methodically, incredibly consistent. As Offersgaard slows and slows, Casper is steady.

The man must have a metronome for a heart. Or legs.

This chart doesn’t show who won — and for that reason, this chart pairs well with one of the first two charts, which show Casper take over the lead right at the last moment.

I did a couple things with this that I want to point out:

  • Eliminate gridlines: a simplification strategy I use throughout — it helps underscore that the purpose of a visualization isn’t as a reference. I’m not trying to let you figure out the x,y coordinates of each data point — I’m trying to simplify and show you a general picture of what happened
  • Label the outliers: since there are no gridlines and the axis is discrete, labeling the outliers helps communicate the range of data — minimally

The team pursuit

Casper and 3 teammates joined forces to compete in the 3 kilometer team pursuit, which is like the individual pursuit but features 4 people per team collaborating to set the fastest time. Casper and his teammates won the World Championship in this event for four consecutive years.

A few things make the team pursuit a bit more involved than the individual pursuit. Four different riders (ideally evenly-matched) share the work of setting the pace at the front. Each time the front rider pulls off, the front of the team moves back a bike-length. So, teams strategize how long each rider can “pull” at the front before they fatigue and let the pace drop.

Additionally, the time is taken when a team’s third rider crosses the finish line. So, teams often plan to have one rider take a “death pull,” in which they give their last bit of energy to pick the speed up late in the race before pulling the plug entirely and detaching from their team. It’s sort of the WITNESS ME of bike racing.

After looking at my charts for Casper-vs-Offersgaard, I wondered if there was a way to combine some of the information into a single chart. I started with a similar bar chart of the lead/deficit, but I graphed it horizontally instead of vertically so that I could also indicate which of the team’s riders was pulling at the front, and for how long: Casper starts out with two half laps, Carlson takes a lap and a half, and so on.

As with the first chart in this article, this format only shows the lead or deficit, which made me wonder if changes in the lead were due to the team speeding up or slowing down. So, I overlaid the team’s half-lap splits — the grey line. This shows me times that they sped up or slowed down slightly, in an event where fractions of a second make the difference between gold and silver.

In this chart, I even ditched the y-axes entirely (scientists everywhere gasp). I’ll graph two data series on one chart, but I won’t put two y-axes on the chart. It’s causes a lot more trouble than it solves — which axis applies to which series? Why are there two sets of numbers? Why aren’t the two zeroes at the same point?

But as I pointed out before, the purpose of data vis isn’t to be a reference guide for the data points. Labeling the high and low values helps show the range, and getting rid of all that peripheral junk helps focus on the good stuff.

Edit, because this is a sandbox — somebody pointed out that having data series with the same value (seconds) on different scales is pretty dang confusing, and they’re totally right. I have having two axes, and if I put them on the same scale it would really blow away the detail I’m looking for:

So, the best bet is probably to separate them out, but still place them near enough to each other that somebody can compare the two in tandem. Here, they share an x-axis:

This required a re-labeling and re-titling of the charts. There’s definitely some room for improvement with how the items are spaced, but it’s a good start.

However, I don’t think the rider labels are as clear here — they’re placed to really apply to the lead/deficit bars, and don’t really look like they also apply to the lap splits.

Back to my conclusions

Overall, I wanted to include design decision to help tell the story. I used this data vis sandbox to emphasize the following techniques:

  • Using color to identify key changes in the story
  • Picking colors that show my biases — the story that I want to tell. I used red for when my rider is down, and green for when they’re ahead.
  • Simplification, to help focus on the story instead of the numbers. I reduced axes and legends, used color to tie together different pieces of information (like data series and labels), and relied on the title and subtitle, not extra labels, to explain the data details
  • Adding context with simple labels: showing the range of data by labeling high and low values, or labeling a key event like the team pursuit death pull

I like how these turned out — and think that these approaches could be useful ways for pursuiters to analyze their performance.

There’s room for further improvement, though — the titles and subtitles could be written more clearly so that they explain the charts, and I got some other feedback that helped inform some iterations on these. Some of these are captured in my edits, and some of the edits open up whole new issues that could be addressed on further refinement.

All of that is a good reminder that even if you think you’re on to something, it’s good to test it with other people.

--

--