Gaussian is everywhere

In 2015, I ran in an event called “Hiranandani Thane Half Marathon”. It was a great event full of high spirited individuals, the first timers, healthy looking unfit people, morning cry babies who wished they didnt signup and lastly the weirdos. We were given bibs which records our start time and end time. It was a great event ! I was lucky to complete the 10K race in 1 hr 10 mins.

After a week, results came out. Well I had a 3 digit rank in the race which pretty much sums up the story. The interesting part was the data released by the organizers.You can download results of race and get data with rank, age and details of race timings. Insights from the data were amazing.

What i expected :

  1. Younger people to finish first
  2. Most people running would in the age group 25–30 and numbers subsequent groups would decrease
  3. Uniform distribution of race timings. Since the group was a mix all ages, fitness, I expected to see people finishing with all kinds of timing and distribution would be uniform.

But here’s a glimpse of what i found

Age is just a number —

The running spirit cannot be explained better than graph below. There is absolutely no correlation between your age and your chances to finish early. If you are looking at the start of the graph. I have zoomed it for first 50 finishers. Here again you will find mix of everyone.

Correlation Index for Rank Vs Age = 0.01735

Age vs Rank Trend for Running event
Age Trend for first Top 50 finishers

Bell Curve is finally seen

When I plotted participation vs Age, the beautiful bell curve was seen. All ages participated and peak of the bell curve was seen at age 38 ! Unbelieveable finding.

Using the Gaussian function, We can easily fit the data of Age wise particpation. Except for few bumps, the curve fits data neatly

Mu = 39 ( Mode age )
Sigma = 9.49 (Standard Deviation of Age)
Coefficient = 102 i.e Peak of Participation Curve (@age 39) 
Note — Replace the 1/ sigma * root( 2*Pi) by Coefficient

Can we see the gaussian somewhere else ? —

Gaussian distribution may explain the age mix, but if plot time for finishing race vs number of athletes what would the graph look like ? You always see the very few guys competing to win, may be 2–3 guys who are sprinting at superhuman speed. Similar is the story for last finishers, these people are few in numbers but are crawling their way to finish line. What about athletes between these two extremes ? How will the distribution look like ?

You guessed it right ! Another Gaussian !
Time required to finish the race also reveals a gaussian distribution. We can use the same technique and fit a gaussian on the live data

Parameters to produce the above gaussian is as follows — 
Mu = 140 ( Mode time for race completion)
Sigma = 25.1 (Standard Deviation of time required to complete)
Coefficient = 400 i.e Peak of finishing Curve (@time 140 mins)

No matter how old you are, 
Or how fast or slow you run
You will always be trapped inside a gaussian !

Gaussian is everywhere !