Re: Every Olympic event should include one average person competing for reference.

American gymnast Simone Biles enchants with her sublime technique and strength, fastest man on Earth Usain Bolt has won again the 100 m sprint, and swimmer Michael Phelps has collected his 23rd gold medal in career (or, differently put, Michael Phelps’s COUNTRY has just reached 36th position in all-time gold medals).

As the Olympic Games draw to a close and I have watched loads of athletes doing extraordinary things on TV (many of which I never thought possible or desirable), web’s recent viral quote comes to mind:

Every Olympic event should include one average person competing for reference.

What Bolt did right there, looked amazing, but just how precise his arm swinging was— remains somewhat unclear.

If Dave, 27 year old barber from Essex, were there racing shoulder-to-shoulder with the gold medallist — we’d finally get it.

Powered by a passion for data science that gets inexplicably stronger with the futility of the problem, I started asking Google about average sprinters, swimmers, jumpers and whatnot. I hoped to find data from, say, a fitness app like Nike+ that I could compare with scores from pros; but an evening of searching left me empty handed and I considered abandoning the project.

Then, the average but Olympic Ethiopian swimmer Robel Habte started racing, and I knew I had to pursue my quest for comparison. Ultimately, I focused my efforts on one discipline, the 100 metre sprint, and opted for an alternative approach to compare Bolt’s and Dave’s performance using nothing more than a set of assumptions and some questionable math.

Let’s start from Bolt’s most recent result: completing the 100 m dash in 9.81 seconds.

How quickly can the average man run 100 metres?

I needed to make an assumption on the type of distribution underlying population performance. A common one is to use a ‘normal distribution’ — the bell curve reflecting the assumption that most people score in the middle, some will be very good, some very bad.

To characterise this curve one needs a mean (the centre of the bell curve) and a standard deviation (sd, the width). The mean is Dave’s performance, which is unknown, but I will try to infer it using as much as ONE point on this curve (and a *cough* arbitrary *cough* choice of the sd).
The one point being Bolt’s 9.81 seconds.

This is how.

First, we need to know the probability of Usain being who he is in this competition: the number one in the world. Back to Google — how many average runners exist that Usain would beat in a competition?

Roughly, there are 945 million (male) people aged between 20 and 35. Outside this range wouldn’t be the “average Dave” anymore. From this number we’ll drop half assuming that only every other individual is ‘run-able’, and could have had an opportunity to compete (and maybe beat) Usain.

The probability of Bolt being Bolt is then

P(x=Bolt)=0.00000002%.

I now have one point on this curve, and I can estimate how far Bolt is from the mean using inverse cdf(x). That distance turns out to be 5.87 sd
A lot. 
For sake of comparison, in medicine a sd of 2 is often used to diagnose pathological cases from just-a-little-weird-but-OK cases.

The mean of the bell curve can be estimated as being 5.87 sd away from the performance value obtained by Mr. Bolt. This leads to 15.68 s.

Conclusion: the average man racing with the big guys would have arrived about 5.5 seconds after most professional runners in Rio.

And here you go, the plot I was after all along:

(a) Inferred distribution for male sprinters (100 m). (b) Inference for the cumulative distribution function.

Are you curious to know how you would compare to either Bolt or Dave?
I made a thing for it.