Analysing the 2019 Crossfit Games using Python

The Crossfit Games are a yearly competition that sets out to find the Fittest on Earth. This years season saw a huge overhaul of the qualification process where changes were made to how athletes qualified for the Games. These changes translated into a more than three times the number of athletes that had qualified the year before, taking the male total from 39 to 143 competitors and the female total from 40 to 130 competitors.

With the huge increase of the number of athletes that qualified for the Games a new system of cutting the bottom x competitors after certain events was introduced. There were in total six cuts during the first six events at the 2019 Crossfit Games, giving this years games the nickname the Hunger Games.

The First Cut, as the first event was named, cut both the male and female fields down to 75 competitors. This continued in event two, which cut both male and female fields down to 50, the third event then cut down to 40 and this process continued until after event six where the final 10 competitors competed for an additional six events when finally the male and female with the highest overall points were crowned the Fittest on Earth.

Image for post
Image for post
At the Crossfit Games an athletes ranking is determined by his overall points score.

With a competition like the Crossfit Games we get a lot of cool data every year but with addition of the cuts this year these Games opened up the possibility to play around with and look at some different scenarios. We can e.g. analyse the scenario that if the scoring had been reset after every cut what competitors would have made it through the cut but did not since their overall score was too low and vice versa.

Getting and preparing the data

The Crossfit Games web team has been so nice as to provide an API to access all event data from every Crossfit Games since 2013. This section will mostly be a description on the Python code I used to get the data from the API and put it into a structure that I could work with. If you just want to see some analysis and plots you can scroll down to the next section.

Libraries used

import pandas as pd
import numpy as np
import requests
import re

Get the data

I used the requests library to send a request to the Crossfit Games API and load the JSON

# urls to the API
urlMales = 'https://games.crossfit.com/competitions/api/v1/competitions/games/2019/leaderboards?division=1&sort=0&page=1'
urlFemales = 'https://games.crossfit.com/competitions/api/v1/competitions/games/2019/leaderboards?division=2&sort=0&page=1'
#Sends the request and reads the response.
responseMales = requests.get(urlMales)
responseFemales = requests.get(urlFemales)
#Loads response as JSON
responseMales = responseMales.json()
responseFemales = responseFemales.json()

Inspect the JSON we got back

# select a competitor to see what data we have
leaderboardRows = responseMales['leaderboardRows']
responseMales['leaderboardRows'][20].keys()
eventData = responseFemales['leaderboardRows'][2]['scores']
# each event contains the following data, this is event 5
eventData[4]
Image for post
Image for post
Inspecting the JSON for male competitors

Creating some variables used for looping over results

# how many competitors, used for for loops
totalMaleCompetitors = responseMales['pagination']['totalCompetitors'] - 1 # index starts at 0 in Python
totalFemaleCompetitors = responseFemales['pagination']['totalCompetitors'] - 1 # index starts at 0 in Python
# how many events, used for for loops
totalEvents = len(responseMales['ordinals'])

Getting male information into a dataframe

I am only showing how I got the male competitor information as the female is the same code except for other variable names and the url used for the API is different. For the entire code take a look at the Jupyter notebook linked at the end of the article.

# create a dataframe that contains one line for every male competitor
df_cfgMales = pd.DataFrame(np.zeros((totalMaleCompetitors,11)),columns=['competitorId','competitorName','overallRank', 'overallScore',
'height', 'heightInInches', 'weight', 'weightInKg',
'age',
'countryOfOriginName','affiliateName'])
# just filling the dataframe with nan data
df_cfgMales = df_cfgMales.replace(0,np.nan)
# loop over all events for all competitors
for i in range(0,totalMaleCompetitors,1):
competitorData = responseMales['leaderboardRows'][i]

# competition data
df_cfgMales.loc[i,'overallRank'] = competitorData['overallRank']
# some athletes got a DF score due to withdrawing from competiton before first event
if len(competitorData['overallScore']) < 1:
df_cfgMales.loc[i,'overallScore'] = "0"
else:
df_cfgMales.loc[i,'overallScore'] = competitorData['overallScore']
# get some personal data!
df_cfgMales.loc[i,'competitorId'] = str(competitorData['entrant']['competitorId'])
df_cfgMales.loc[i,'competitorName'] = competitorData['entrant']['competitorName']
df_cfgMales.loc[i,'countryOfOriginName'] = competitorData['entrant']['countryOfOriginName']
df_cfgMales.loc[i,'height'] = competitorData['entrant']['height']

# converting the height string to total inches so I can calculate height in cm later
heightInFeet = int(competitorData['entrant']['height'][:1])
# to account for if the inches are 10 or over
if len(competitorData['entrant']['height']) > 4:
heightInInches = int(competitorData['entrant']['height'][2:4])
else:
heightInInches = int(competitorData['entrant']['height'][2:3])
df_cfgMales.loc[i,'heightInInches'] = heightInFeet*12 + heightInInches

df_cfgMales.loc[i,'age'] = competitorData['entrant']['age']
# clean lbs from weight
df_cfgMales.loc[i,'weight'] = re.sub("lbs", "", competitorData['entrant']['weight'])
try:
df_cfgMales.loc[i,'affiliateName'] = competitorData['entrant']['affiliateName']
except:
df_cfgMales.loc[i,'affiliateName'] = 'Unaffiliated'
# clean the "T" from workoutRank, this is when athletes are tied in a workout
try:
df_cfgMales.loc[i,'overallRank'] = re.sub("\D", "", df_cfgMales.loc[i,'overallRank'])
except:
df_cfgMales.loc[i,'overallRank'] = df_cfgMales['overallRank']
df_cfgMales.loc[i,'overallRank'] = int(df_cfgMales.loc[i,'overallRank'])
# change overallRank and overallScore to a numeric value so we can to some calculations
df_cfgMales['overallRank'] = df_cfgMales['overallRank'].astype('int')
df_cfgMales['overallScore'] = df_cfgMales['overallScore'].astype('int')
df_cfgMales['age'] = df_cfgMales['age'].astype('int')
df_cfgMales['weight'] = df_cfgMales['weight'].astype('int')
# create a new columns and convert height to cm and wieght to kg
df_cfgMales['heightInCm'] = round(df_cfgMales['heightInInches'].apply(lambda x: x*2.54),1)
df_cfgMales['weightInKg'] = round(df_cfgMales['weight'].apply(lambda x: x/2.205),1)

Testing the results

df_cfgMales.head()
Image for post
Image for post
df_cfgMales.info()
Image for post
Image for post

Looks good, we have no null values and the datatypes are all correct.

Getting male event information into a dataframe

Finding the number of lines needed by using how the cuts were made

event1 = totalMaleCompetitors
event2 = 75 + 1 # there were for some reason 76 guy's that did event 2
event3 = 50
event4 = 40
event5 = 30
event6 = 20
event7To12 = 10*6
numberOfLinesMales = event1 + event2 + event3 + event4 + event5 + event6 + event7To12
numberOfLinesMales # 419

Male event information

# create a dataframe that contains one line for every event for every competitor
df_cfgMalesEvents = pd.DataFrame(np.zeros((numberOfLinesMales,8)),columns=['competitorId','event','breakdown','lane',
'heat', 'points', 'time/score', 'workoutRank'])
# just filling the dataframe with nan data
df_cfgMalesEvents = df_cfgMalesEvents.replace(0,np.nan)
# loop over competitors
for i in range(0,totalMaleCompetitors,1):
competitorData = responseMales['leaderboardRows'][i]
# loop over all events for athlete i
for j in range(0,totalEvents):
# having it so that if an athlete has been cut or withdraws we do not write more event data for him
# the loop breaks and goes up into the start again and starts with athlete i
if (competitorData['scores'][j]['workoutrank'] == 'CUT' or
competitorData['scores'][j]['workoutrank'] == 'WD'):
break
# j+(i*totalEvents) so for athlete 0 we get lines 0 - 11, for athlete 10 we get lines (120 - 131) etc.
# get the competitorId so we can join later to the df_cfgMales dateframe
df_cfgMalesEvents.loc[j+(i*totalEvents),'competitorId'] = str(competitorData['entrant']['competitorId'])
# event results
df_cfgMalesEvents.loc[j+(i*totalEvents),'event'] = competitorData['scores'][j]['ordinal']
df_cfgMalesEvents.loc[j+(i*totalEvents),'breakdown'] = competitorData['scores'][j]['breakdown']
df_cfgMalesEvents.loc[j+(i*totalEvents),'lane'] = competitorData['scores'][j]['lane']
df_cfgMalesEvents.loc[j+(i*totalEvents),'heat'] = competitorData['scores'][j]['heat']
df_cfgMalesEvents.loc[j+(i*totalEvents),'points'] = competitorData['scores'][j]['points']
df_cfgMalesEvents.loc[j+(i*totalEvents),'time/score'] = competitorData['scores'][j]['time']
df_cfgMalesEvents.loc[j+(i*totalEvents),'workoutRank'] = competitorData['scores'][j]['workoutrank']
# clean the "T" from workoutRank, this is when athletes are tied in a workout
try:
df_cfgMalesEvents.loc[j+(i*totalEvents),'workoutRank'] = re.sub("\D", "",
df_cfgMalesEvents.loc[j+(i*totalEvents),'workoutRank'])
except:
df_cfgMalesEvents.loc[j+(i*totalEvents),'workoutRank'] = df_cfgMalesEvents.loc[j+(i*totalEvents),'workoutRank']
# drop the NaN rows that came because of the break statement if an athlete is cut
df_cfgMalesEvents = df_cfgMalesEvents.dropna()
# and then reset the index since it is all out of sync after that
df_cfgMalesEvents = df_cfgMalesEvents.reset_index()
# just naming the index column to oldIndex
df_cfgMalesEvents.columns = ['oldIndex'
,'competitorId','event','breakdown','lane',
'heat', 'points', 'time/score', 'workoutRank']
# change non-numeric columns to numeric so we can to some calculations on them
df_cfgMalesEvents['points'] = df_cfgMalesEvents['points'].astype('int')
df_cfgMalesEvents['event'] = df_cfgMalesEvents['event'].astype('int')
df_cfgMalesEvents['workoutRank'] = df_cfgMalesEvents['workoutRank'].astype('int')

Testing the results

df_cfgMalesEvents.head()
Image for post
Image for post
df_cfgMalesEvents.head()
Image for post
Image for post
df_cfgMalesEvents.describe()
Image for post
Image for post

Looks good, we have no null values and the datatypes are all correct. Let’s do a bit more testing.

Summing competitor points to see if it matches with the df_cfgMales from before

totalPoints = df_cfgMalesEvents[['competitorId','points']].groupby('competitorId').sum()
totalPoints = totalPoints.sort_values(by=['points'],ascending=False)
df_totalPointsCheck = pd.merge(totalPoints, df_cfgMales, how="inner", on="competitorId")
df_totalPointsCheck[df_totalPointsCheck['points'] != df_totalPointsCheck['overallScore']] # empty

What about checking for number of competitors in the df_cfgMalesEvents and df_cfgMalesEvens

len(df_cfgMalesEvents['competitorId'].unique()) == len(df_cfgMales)

Yields a False, meaning that some competitor is missing in the df_cfgMalesEvents dataframe. Let’s try to find him

df = pd.merge(df_cfgMalesEvents,df_cfgMales, how="outer", on="competitorId")
df['competitorName'][df['event'].isnull()]
Image for post
Image for post

This makes sense as Fredrik withdrew before the competition begun, but was still a registered athlete at the Games. So Fredrik is not in the events dataframe as there is no event information for him.

Plotting some cool results

Here we look at some basic plots of competitor information and results from the 2019 Crossfit Games.

I used the libraries Matplotlib and Seaborn for all the plots in the analysis.

import seaborn as sns
import matplotlib.pyplot as plt

What country had the best average ranking?

Overall there were 113 countries represented in the male and female categories at the 2019 Crossfit Games.

Males

The country with the best average rank for the males was Iceland with an average rank of 3rd place, where Björgvin Karl Guðmundsson was the only male competitor from the country. In comparison the USA had 26 male competitors and an average ranking of 36th place.

Image for post
Image for post

The top 10 countries

Country       Average ranking
Iceland 3.0
Germany 18.0
United Kingdom 19.0
France 21.0
Switzerland 21.0
Ireland 26.0
China 27.0
South Africa 29.0
Latvia 31.0
Sweden 31.0

Females

The country with the best average ranking for the females was New Zealand with and average rank of 3rd place, where Jamie Green was the only female competitor from the country. In comparison the USA had 21 female competitors and an average ranking of 33rd place.

Image for post
Image for post

The top 10 countries

Country          Average ranking
New Zealand 3.000000
Greece 9.000000
Italy 14.000000
Hungary 15.000000
Canada 16.333333
Iceland 17.000000
United Kingdom 18.000000
Ireland 21.000000
Slovakia 22.000000
Norway 24.500000

Overall

Combining both the male and female results shows Iceland as the best country at the 2019 Crossfit Games with an average ranking of 15th place from 6 athletes.

Image for post
Image for post

The top 10 countries

Country         Average ranking
Iceland 14.666667
United Kingdom 18.500000
Ireland 23.500000
Switzerland 26.000000
New Zealand 28.000000
Canada 29.555556
Norway 29.666667
Hungary 31.500000
United States 33.063830
Australia 33.375000

Out of this analysis we can infer that Iceland is overall the best country in the world at Crossfit.

Comparing the weight of competitors

Disclaimer, the weight and height information is self-listed and updated on a competitors profile on the Crossfit Games website. From what I know competitors are not weighted and measured when they show up at the Games. I would then say it’s fair to assume that these numbers do not accurately reflect the actual weight of competitors at the 2019 Crossfit Games but are more of an indicator.

Males

The weight of all the male competitors at the 2019 Crossfit Games ranged from 63.5 kg to 98.9 kg with an average weight of 84.7 kg. The top 10 ranked competitors weight as ordered by their overall rank was

Image for post
Image for post

The heavies competitor at the 2019 Crossfit Games was Bronislaw Olenkowicz, weighting in at 98.9 kg, 0.5 kg more than the next heaviest competitor Brent Fikowski.

Image for post
Image for post

The lightest male competitor at the 2019 Crossfit Games was Keith Nhan, weighting in at 63.5 kg, 2.7 kg less than the next lightest competitor Katlego Kgwadi.

Image for post
Image for post

Females

The weight of all the female competitors at the 2019 Crossfit Games ranged from 52.2 kg to 79.8 kg with and average weight of 64.3 kg. The top 10 ranked competitors weight as ordered by their overall rank was

Image for post
Image for post

The two heavies females competing at the 2019 Crossfit Games were Hanna Karlsson and Dina Swift, both weighting in at 79.8 kg.

Image for post
Image for post

The two lightest females competing at the 2019 Crossfit Games were Patricia Trujillo and Akiko Kamitani, both weighting in at 52.2 kg.

Image for post
Image for post

Comparing the height of competitors

Males

The height of all the male competitors at the 2019 Crossfit Games ranged from 162.6 to 188 cm with an average height of 172.8 cm. The top 10 ranked competitors weight as ordered by their overall rank was

Image for post
Image for post

There are six competitors tied as the tallest at the Crossfit Games 2019, all listed at 188 cm in height.

Image for post
Image for post

There are three male competitors listed under 100 cm, which I assume is some issue with the data. Going of that the smallest male competitor at the 2019 Crossfit Games was Keith Nhan, listed as 162.6 cm in height.

Image for post
Image for post

Females

The height of all the female competitors at the 2019 Crossfit Games ranged from 149.9 to 177.8 cm with an average height of 158.9 cm. The top 10 ranked competitors weight as ordered by their overall rank was

Image for post
Image for post

The two tallest female competitors at the 2019 Crossfit Games were Dina Swift and Ksenija Kecman, listed as 177.8 cm in height.

Image for post
Image for post

Going by the same logic as with the males, the smallest female competitor at the 2019 Crossfit Games was Patricia Trujillo, listed as 149.9 cm in height, she was also the lightest female.

Image for post
Image for post

Comparing the height and weight of competitors

Looking at a scatter chart of height and weight shows us that in general there is a linear relationship between height and weight of the male and female competitors.

Image for post
Image for post
Image for post
Image for post

Can we spot a linear relationship between height and overall rank at the 2019 Games?

I would say that just by looking at all the competitors there is not a clear linear relationship between height and overall rank.

Image for post
Image for post
Image for post
Image for post

What about if we cherry pick a little bit and look at the relationship between the performance of competitors in the event Marry, which is a purely gymnastics based workout, and height?

Image for post
Image for post
The winner of Mary was
competitorName heightInCm
Noah Ohlsen 170.2

Last place in Mary was
competitorName heightInCm
Ant Haynes 177.8
Image for post
Image for post
The winner of Mary was
competitorName heightInCm
Karissa Pearce 160.0


Last place in Mary was
competitorName heightInCm
Colleen Fotsch 172.7

It looks like there is a negative linear relationship between height and performance in Mary, which is something that we would have assumed.

Comparing Mat’s and Tia’s performance

The male competition was a lot closer this year than the previous 3 years, while the female competition was the opposite. We can see this by graphing Mat’s and Tia’s workout ranking over the 2019 Games

Image for post
Image for post

Where Tia’s variance of workout rank is lower than Mat’s. Tia’s lowest rank were two 12th place finishes while Fraser’s worst finish was a 21st place in event 4. Tia won the female competition with a 195 points while Fraser only won by a 35 points.

Analysing the cuts

There are two ways to look at the six cuts that were made during the first six events at the 2019 Crossfit Games; what competitors of those that had been cut would have made it through the cut if the overall score would have been reset after the last cut and what competitors would have been cut that actually made it through the cut.

Males

The second cut

The males that would have made it trough the second cut if the scores would have been reset after the first cut

competitorName  overallRank  workoutRank
Connor Nellans 63 49
Erik Toth 59 43
Bryan Hernández 51 39
Marlon Azurdia 55 37

Marlon Azurdia placed 67th in the first event and 49th in the second event which placed him 55th overall, so he was cut after event 2. If the scoring would however have been reset after event 1 he would have made it through the second cut since he placed 37th in event 2. The same logic applies for the other competitors in the table above.

The males that would have been cut if the scores would have been reset after the first cut

competitorName  overallRank  workoutRank
Jason Smith 29 73
Guilherme Malheiros 49 57
Jason Carroll 39 54
Willy Georges 21 53

Here we have some big name competitors like Willy Georges and Jason Carroll that would have been cut after event 2 if the scoring would have been reset after event 1 (workoutRank).

The third cut

Here we go by the same logic as above for the second cut but assuming that no scoring has been reset before, i.e. the second cut scenario above does not apply.

The males that would have made it trough the third cut if the scores would have been reset after the second cut

competitorName  overallRank  workoutRank
Gábor Török 48 40
John-Paul Hethcock 46 38
Mohamed Elomda 44 37
Arminas Balevicius 43 35
Nick Bloch 41 30

The males that would have been cut after event three if the scores would have been reset after the second cut

competitorName  overallRank  workoutRank
Saxon Panchik 9 46
Logan Collins 14 44
Jonne Koski 37 43
Samuel Cournoyer 35 42
Scott Panchik 4 41

Again, some big names like the Panchik brothers and Jonne Koski would have been cut.

The fourth cut

Same logic applies as with the third cut above.

The males that would have made it trough the fourth cut if the scores would have been reset after the third cut

competitorName     overallRank  workoutRank
Uldis Upenieks 31 30
Eric Carmody 38 28
Dean Linder-Leighton 36 25
Jason Carroll 39 23
Lukas Esslinger 34 11

The males that would have been cut after event four if the scores would have been reset after the third cut

competitorName  overallRank  workoutRank
Ben Smith 28 40
Travis Mayer 12 38
Patrick Vellner 16 35
Chandler Smith 15 32
Casper Gammelmark 20 31

The fifth cut

The males that would have made it trough the fifth cut if the scores would have been reset after the fourth cut

competitorName  overallRank  workoutRank
Sean Sweeney 24 20
Willy Georges 21 14

The males that would have been cut after event five if the scores would have been reset after the fourth cut

competitorName  overallRank  workoutRank
James Newbury 5 23
Samuel Kwant 13 21

The final cut

The males that would have made it trough the sixth and final cut if the scores would have been reset after the fifth cut

competitorName  overallRank  workoutRank
Joshua Wichtrup 18 10
Cole Sager 11 8
Travis Mayer 12 6

The males that would have been cut after event 6 if the scores would have been reset after the fifth cut

competitorName  overallRank  workoutRank
Jacob Heppner 6 17
Mathew Fraser 1 15
Adrian Mundwiler 8 13

Females

The second cut

The females that would have made it trough the second cut if the scores would have been reset after the first cut

competitorName         overallRank  workoutRank
Maria Camila Quintero 62 47
Brooke Haas 55 46
Lisa Eble 51 43
Erin Vandendriessche 52 35
Brenda Castro 56 32

The females that would have been cut if the scores would have been reset after the first cut

competitorName       overallRank  workoutRank
Ksenija Kecman 43 67
Oddrún Eik Gylfadottir 39 60
Carol Colling-Romero 49 59
Thelma Christoforou 48 54
Cheryl Nasso 40 53

The third cut

The females that would have made it trough the third cut if the scores would have been reset after the second cut

competitorName  overallRank  workoutRank
Michelle Merand 42 38
Rachel Garibay 41 36
Ksenija Kecman 43 33

The females that would have been cut if the scores would have been reset after the second cut

competitorName     overallRank  workoutRank
Courtney Haley 30 44
Meg Reardon 24 42
Annie Thorisdottir 12 41

The fourth cut

The females that would have made it trough the fourth cut if the scores would have been reset after the third cut

competitorName       overallRank  workoutRank
Lindsay Vaughan 35 29
Paige Semenza 31 27
15 Alessia Joy Wälchli 36 25
Emma Tall 32 22

The females that would have been cut if the scores would have been reset after the third cut

competitorName  overallRank  workoutRank
Mekenzie Riley 26 36
Haley Adams 6 34
Madeline Sturt 23 32
Feeroozeh Saghafi 25 31

The fifth cut

The females that would have made it trough the fifth cut if the scores would have been reset after the fourth cut

competitorName  overallRank  workoutRank
Mekenzie Riley 26 19
Emma McQuaid 21 18
Madeline Sturt 23 16
Feeroozeh Saghafi 25 12

The females that would have been cut if the scores would have been reset after the third cut

competitorName     overallRank  workoutRank
Laura Horvath 15 27
Amanda Barnhart 7 25
Danielle Brandon 11 24
Alessandra Pichelli 14 21

The final cut

The females that would have made it trough the sixth and final cut if the scores would have been reset after the fifth cut

competitorName     overallRank  workoutRank
Danielle Brandon 11 8
Laura Horvath 15 7
Carolyne Prevost 13 2

The males that would have been cut after event 6 if the scores would have been reset after the fifth cut

competitorName  overallRank  workoutRank
Karissa Pearce 5 16
Amanda Barnhart 7 15
Anna Fragkou 9 12

How the final rank would have been if the scoring would have been reset after the final cut

Maybe the most interesting scenario with regards to the cuts is to look at how the overall ranking would have been if the scoring would have been reset after the final cut.

Males

No major changes would have been in the final ranking of the males if the scoring would have been reset after event, except for Björgvin Karl Guðmundsson moving from third to second. We can however see how dominant Fraser was after we went to the final 10 competitors, Fraser accumulated 80 more points than Guðmundsson which accumulated the second highest number of points in the final six events.

Image for post
Image for post
competitorName             overallRank  afterEventSixRank
Björgvin Karl Guðmundsson 3 2
Noah Ohlsen 2 3

Females

There would have been a few shifts in the overall rank for the females if the scoring would have been reset after the final cut. Katrín Tanja would have taken second place while Holte would have ended in fourth, Amanda Barnhart would have taken fifth and Karissa Pearce seventh.

Image for post
Image for post
competitorName                overallRank  afterEventSixRank
Katrin Tanja Davidsdottir 4 2
Kristin Holte 2 4
Amanda Barnhart 7 5
Karissa Pearce 5 7

I’ll end with the Jupyter notebook for all the Python code

It’s just a lot prettier than then the code in the beginning of this article …

https://github.com/arnarhardar/rice-krispie-cakes/blob/master/Crossfit%20Games%202019_final.ipynb

Get the Medium app