20km of Lausanne: The Data Behind It

Xavier Penya
5 min readMay 6, 2017

--

A data analysis of the last 9 editions of this race

Tools: OpenOffice for visualization / c# (dotnet core) for processing. The source code can be found on Github.

Two weeks ago, I participated for the first time in the race “20km of Lausanne” in Switzerland. It was the longest distance I have ever run, both in a race or training.

For those who are not familiar with this race: it consists of a section of extenuating slopes during the first 12km, followed by a leg-breaking descent during the next 8km. A total of 212m elevation gain (as measured by my Garmin watch).

I managed to finish it with an unimpressive mark of 1h 45min 28sec (Alex Kibarus won the race with a time of 59min 12sec, setting a new record). But to me it was way better than expected for several reasons.

Elevation profile of the race, as measured by my Garmin running watch

It was quite a challenge for me. And when I saw that the data was publicly available on the “results” section of their website I felt an itch to analyze it.

I wanted to answer questions like “can I do better next time?”, “if so: how much better?”, “would age be an important factor?”. So I acquired the data containing the results of the last 9 editions of this race and started playing with it.

Who participates in this race?

This is a breakdown by age and gender of all the people who has finished the race during the last 9 years:

First of all, it’s impressive and humbling to know that there are people over 80 years old that are able to finish this challenging race.

The graph displays the huge difference in participation between men and women. In both cases the rate is relatively flat for ages from 25 to 45 years old. Although if you look closely, it is noticeably less flat for women. Alas, the drop in participation matches the average age for Swiss women to give birth to their first child (30.6 years).

Does younger mean faster?

This is the average speed per age:

Only those ages that had at least 30 finishers were selected, in order to make the average significant. The average time is relatively flat between ages 20 and 40 for both men and women.

After that plateau there is a clear increase. But taking a closer look at the men’s average, the difference between the min average time (01h 37min at age 22) and the max average time(01h 53min at age 66) is only 16 minutes. Not too much for a difference of 44 years.

Although we should be aware that there could be a very important selection bias for ages 45 and above: people above 60 who are running this race could be “less average” than people in their 20's or 30's.

Let’s take a closer look and split the performance in different confidence intervals:

This data corresponds to men only, because they are more statistically significant (more data available). I have selected a minimum data sample of 100 runners per age, which just wasn’t available in the case of older women.

Looking at the 5% top performers, the peak form seems to be at 25 years old. It’s also remarkable that the top performers in their 60’s don’t seem to follow the trend of average and slower performers. I think this is a sign of selection bias for older runners.

In any case the graph indicates that if you are a man younger than 45 and you want to be in the top 5%, you have to run the 20km under 1h 20min. In my case it means that I should run 25 minutes faster than what I did (*gasps*). Is that feasible?

Runners’ progression between different editions

In order to see the progression, here is the plot that shows the correlation between a “participant’s worst time in all race editions” and his/her progression (defined as “the time difference between best and worst time”):

Time difference between best and worst result versus Participant’s worst result in all editions. One point = one participant (1127 data points). In orange: linear regression (R²=40.02%).

Only participants that have completed at least 5 editions of the race (1127 people) are shown in this graph. This time we are not considering age as a parameter: we have already seen that its influence on the runner’s performance is limited.

The margin of progression is pretty big: from an average of 4min if your mark is 1h 20min, to an average of 17min if you are currently at 02h 00min. Doubling those progression margins is also feasible.

So according to the graph, and assuming that my only race time (1h 45min) is my “worst time” (a big IF…), is it possible to run 25min faster and be in the top 5%?

The good news is: it’s been done before. But turning a 1h 45min mark into 1h 20min is at the limit of what has been done before. A more realistic goal would be aim for the average progression, which is running 12 minutes faster than 1h 45min. I am not going to lie: I would be pretty happy with that.

Conclusions

First of all, there are several reasons that makes us think that age is not an excuse:

  • Some people over 80 years old have been able to finish this race.
  • The average time per age is the same for ages 20 to 40.
  • There isn’t a huge time difference (an average of 16 minutes) between people at 22 years old and people at 66 years old.

Also, the margin of progression is pretty big for all levels of runners. Obviously the higher the time, the higher the possibility to progress.

Finally, always remember that no matter how slow you go, you are still lapping people sat on the couch.

--

--

Xavier Penya

Developer & data analyst | Loves playing around with side projects