Strava KoM Analysis

I always thought that the following chart would be an interesting visualization…

The plot shows King of the Mountain (KoM) achievements from the Strava fitness tracking app on a particular segment. This is a segment — Whirlwind hill — is one of the toughest hills near me, in Wallingford, CT.

Alas, I am not the KoM (but my good friend and fellow Choate teacher Will Morris is).

The graph above shows exactly what I imagined that it would. Strava’s initial release was in 2009, and the graph shows that it was first ridden (and subsequently KoMed) in ~2010. However, after this, as Strava subscriptions increased and competition followed with them, the KoM time fell precipitously, and often.

Now, though, Will Morris’s time of 3.31 (18.9 mph) is quite quick. Even the preceding KoM times were quite fast and hard to reach, as seen by the long time between new best efforts.

Alas, though I thought my work might add something to the world, Jonathan O’Keefe has already done me one (or two or three) better. See his awesome work here.

Nevertheless, I’ll walk through the steps below to outline my elementary data scraping, scrubbing, and analyzing process for anyone who might be interested.

I used PostMan to play around with the Strava API. Their “Params” feature is super helpful.

After getting the access token figured out (and mistakenly posting the work to github with said private token in the code and then having my strava account hacked by github trawling bots…a story for another time), I took to Python.

The following code produced the data for the chart:

import requests
import secrets
import numpy as np
open('kom.txt', 'wb')
class stravaSegment(object):
def __init__(self,id):''+str(id)+'?access_token='+secrets.apiToken).json()['name']
def getKOMs(self):
# set the number of efforts printed 'per page'
# set start and end dates for efforts investigated
# go through all of the pages.
for i in range(100):
# build url call
url= ''+str('/all_efforts?access_token='+secrets.apiToken+'&start_date_local='+startDate+'&end_date_local='+endDate+'&per_page='+perPage+'&page='+str(i+1)
# get
# convert to json
# add efforts to python list
# make empty list to put times in
times = []
# an initial, very slow fastest time
# go through a pages
for a in range(len(efforts)):
# go through the number of efforts per page
for b in range(len(efforts[a])):
# if time is fastest yet, then add to our list
# update the fact that new fastest time has change
if efforts[a][b]['elapsed_time']<fastestTime:
oneRun = []
return times
# test
whirl = stravaSegment(673849)
koms = whirl.getKOMs()
np.savetxt('test.csv', b, delimiter=",",fmt='%s ')

A brief walk through the code: I created a class for each segment. In the init function, basic info like the name, effort count (how many times segment has been ridden), and athlete count are added to the segment object. The requests library is used to perform a GET. Also, I stored my secret token in a local python file; I then imported this module (import secrets) and referenced the variable with secrets.apiToken. I used .json() to convert the requests output to a JSON object, and then used the [‘athlete_count’] keys to access desired data.

The function of interest is the getKOMs function. I need to set the start and stop dates wherein the function will search. I used a for loop to call the GET function on multiple urls. Strava’s API takes a perpage parameter, and a page parameter. I set the perpage to show 100 results per page, and then looped over 100 pages, giving me 10,000 efforts. This could have been done in other ways, and I used trial and error to make sure that I was getting segments up to the end of my date range. I am sure there is a more efficient way to do this date range selection.

I then set an artificial, initial fastest time of 10,000 seconds. Depending on the strava segment, this should be changed (there are some segments over 3 hours, and 10,000 seconds is just under 3 hours, so this would be a bad approximation what is meant to, at the start, represent a time that is slower than all other segment efforts). I then looked at a given effort and its time: if the time was below the current fastestTime, I added it to the times array and adjusted fastestTime to reflect this new KoM.

To finish, I executed the code for my favorite climb. The output of the function call was a list of lists, and I used numpy’s asarray and savetxt to smoothly export the data into a clean csv.

This results in a csv file that takes minimal processing.

A bit of data cleaning in excel (which could have been done in R, but i find excel’s UI for this simpler) (“Data” > “Text to columns” > Delimit on the “T”) and I was able to import into R. The following R code, with comments, produces the chart in question.

#load csv
test <- read.csv(“...test.csv”, header=FALSE)
#load ggplot lib
# create variable names
#change dates from factor to character
#convert from character to R date object
# make plot using ggplot geom_line
ggplot(test, aes(dateOfEffort,effortTime))+geom_line(color=’orange’)+geom_point(color=’dark orange’)+ggtitle(“Whirlwind Koms over Time”)+
labs(x=”Date”,y=”Segment time”)