Analyse yourself or how could Python help to achieve your goal?

Vladimir Semenov
4 min readApr 8, 2018

--

As because I have a technical mind, I believe that “everything around us is numbers” © Numb3rs.

In this case, I decided to investigate a performance of my marathon preparation and try to estimate the full time for the full marathon run before it actually happens, using my Strava tracked data.

To achieve that, I decided to create a Python script to calculate an estimate. Python is a programming language that is really good for doing some mathematics calculation and easy to learn.

Linear prediction

First of all, I used a basic human logic to find that the very brief estimate could be easily calculated using just a linear function. Said and done.

The code above is pretty simple from the logic perspective. Let’s assume that we have a constant pace for the whole race the same as any of previous runs. In this case, the formula is a basic multiplication pace to the desired distance.

Predicted time = Desired distance * Pace

Exploring my running statistics, I found that Strava calculates not only a pace but also a GAP which is Grade Adjusted Pace. Taking this into account, the linear formula with Pace and GAP gives us a brief estimate of the fastest and probably slowest time, assuming that the actual race is flatter than my usual runs and having a bit faster pace as well.

Well, using linear formula gives us boundary values for the estimation. Not bad, but still not good enough. When I tried to use it with some of my first preparation runs it gives me very distant values between 3.5 and 4.5 hours.

I expected to have more precise values as a result, so I started to explore other possible formulas to calculate a time prediction. After some time, I found a better formula called Pete Riegel formula.

Pete Riegel formula prediction

In a 1977 article for Runner’s World Magazine, Riegel proposed a simple formula for comparing relative performances at different distances. The formula is most commonly quoted as:

Predicted time= T1 * (D2 / D1)^C

  • T1 is the time achieved for D1
  • D1 is the distance over which the initial time is achieved
  • D2 is the distance for which the time is to be predicted
  • C is the pace degradation coefficient, from 1.06 to 1.10

Using this formula gives more precise values for the estimated time, however, it is still using two boundary values with degradation coefficient 1.06 for faster time and 1.10 for the slowest one.

Exploring my running statistics again, I found that Strava provides with the information about elevation. In this case, taking into account the value of elevation for the Rotorua Marathon race, I assumed that it might help me to calculate a more precise my pace degradation coefficient for a race.

To achieve that I created a code to calculate a grade based on elevation and distance and code to calculate the coefficient by grade. I assumed that 0% grade could represent the lowest value of the coefficient and 3% is the highest one.

In the result, I received the coefficient around 1.077 which represents a low medium difficulty for the Rotorua Marathon race.

In the nutshell, with a combination of the Pace, GAP and degradation coefficients, now I have estimations with a different confidence level. I created a simple web page (using Google Charts) with a graphic which shows a visualisation of the script estimations results. It looks like the image below.

Well, if you check the graphic above, you can see that there is a trend to run faster. I used data from my first 25 preparation runs.

Let us take a look closer. In the beginning, the fastest predicted time is a Linear GAP time with 03:53:28 which is sub 4 hours, yay! However, all other predictions are more 4 hours with the slowest Riegel prediction with the highest coefficient 1.10 is 04:49:24, ooh. The main two lines I believe have a adequate time show the time between 04:19:11 to 04:32:47. This is still satisfactorily but far from what I expect from my actual marathon race.

The 25th run shows a faster time, from 03:33:47 to 03:49:27, which is a really great prediction for me. However, this run could be less or more accurate only because that run was on a treadmill. The run was almost flat and fast-paced.

In the meantime, if you look closer to a 16th run then you can see that the fastest time is 03:22:32 and the slowest is 04:04:40. There was a morning run on the street for 40 minutes with really fast pace 04:56.

All in all, I believe I managed to find some fun in the marathon preparation as well as create a helpful tool to research my performance data. Moreover, I showed that it might be interesting to treat yourself as a resource of data for analysis.

At the end of this retrospective session, I agreed that I have a good tendency to increase my pace and I expect to achieve my second goal to run a sub 4 hours marathon.

You can find a source code by in my GitHub by the link below.

The next retrospective session is scheduled to be at the end of the project which literally means after I run a marathon. I believe it will be interesting and fun. See you then!

If this article was helpful or interesting please hit the clap button and feel free to share it. I’ll be sure to deliver you more articles in the weeks to come.

--

--