Exploring IFSC’s score data: part 1

A A
Baesian Climbers
Published in
6 min readDec 16, 2019

An oft-told meme in the climbing community states that speed climbing isn’t a real form of climbing, and consequently, skill in speed-climbing doesn’t actually correlate with climbing in other disciplines. Now, while making and sharing memes is fun, we wanted to dig a bit more and find out how climbing skills actually correlate across disciplines.

This immediately begs the question of what climbing skill actually is. Quantifying climbing skill is hard: Grades aren’t always on a clean numerical scale, every gym grades a bit differently (a source of even more memes), and competition climbs often don’t have an associated grade. Our dataset, scraped from the IFSC website, contains scores from dozens of competitions and thousands of climbers (It’s a few megabytes, so not exactly big data). We can’t just throw together a generic regression such as y=mx+b, thanks to these variations. We also can’t just fit each gym and competition its own m and b, since then we don’t have a useful quantifiable skill level describes how that climber will climb at other gyms and competitions. Treating each competition separately is wasteful and doesn’t allow us to compare two climbers who went to different competitions.

Thankfully, we have some tricks up our sleeve. Let’s start with just lead climbing data for now. We use something called hierarchical modeling, a fancy term for a model where the parameters (the m and the b from the example above) are themselves conditioned on something. Andrew Gelman from Columbia gives a more formal introduction. In our case, we’ll start by making the parameters vary for each competition. This makes sense, since each competition has a different route physically set on the wall, meaning that the mapping from skill to score varies. Add in an error term, and we get:

The score for climber n at competition m is a function of the a0 and a1 parameters of that competition, as well as the skill of that climber. Epsilon is an error term

Note that we don’t know what each climber’s skill value actually is. We call this a latent variable — we’ll learn a plausible value for it later on.

While simple, this is surprisingly powerful on its own. Many climbers compete in a single competition, allowing us to infer a good a0 and a1 mapping. However, since the same climbers compete in multiple competitions (and we assume that their skill is the same each time they compete), we get a consistent skill scale for all climbers. Our dataset is richly intertwined in a way that allow us to enforce consistency throughout, with the model structure we proposed.

However, an astute reader would notice that this set of equations is under-constrained. We haven’t established a scale for skill, which is a latent (and hence unobserved) variable. If we halved all the skill values, we could double all of the a1 values and we’d have the same outcomes. What if all the skill values got multiplied by a billion? Divided by Graham’s number?

Of course, that astute reader is wrong. We are Bayesian climbers, not frequentist climbers. We don’t assume that blindly putting a hammer to our data is going to give us results. Instead, we encode some of our assumptions and suggestions on the model in the form of priors. We assume that lead skill ought to correlate positively to lead scores, based on the basic fact that higher scores are better. This can be encoded into the following prior:

For each competition, we assume that its a1 parameter is normally distributed around 1. This prior encodes our belief that the parameter is positive, without forcing a hard constraint (in case a particular competition was terrible or had noisy data)

We also center the skills of climbers around 0. This prior additionally helps to set the scale of climber skill values, to some extent.

The lead-climbing skills for each climber follow a unit normal distribution

Remember that error term from the previous section? We don’t actually know the error variance for each competition. We’d rather not guess, and we need to account for the fact that it might vary for each competition. We solve this with a hyperprior on the error term:

γ_L is a hyperparameter controlling a hyperprior — so a hyperhyperparameter?

Of course, a model is no good without a way to train it. We have thousands of parameters (each climber’s skill is a latent parameter, plus the parameters for each competition), as well as a bunch of priors and hyperpriors. In the face of insurmountable complexity, we give up on the elegance a closed-form solution, and use a bit of brute-force (just like we end up using inelegant brute-force beta while cutting feet repeatedly on boulder problems we can’t figure out).

Remember Andrew Gelman from the hierarchical modeling paper? He’s back, with a software tool called Stan. Stan is a “probabilistic programming language for statistical inference” which sounds like just the thing we need. We can just express our model, priors and all, as code, and it’ll do the hard work of finding a suitable set of parameters by directly drawing samples from the posterior distribution, over the space of all parameters.

The following code represents our model in STAN:

data {
int<lower=0> numLscores; // Number of lead results
int<lower=0> maxN; // Max climber number observed
int<lower=0> maxML; // Max lead comp number observed

vector[numLscores] lscore;
int<lower=1,upper=maxN> lClimber[numLscores];
int<lower=1,upper=maxML> lComp[numLscores];

real hypersigma_LScore;
}
parameters {
vector[maxN] ls; // Latent lead skill
vector[maxML] al0; // Lead score intercept, per comp
vector[maxML] al1; // lead skill->score slope, per comp

vector<lower=0>[maxML] sigmaL; // Noise in bt, per comp
}
model {
// Model lead scores via lead skill latents
for (i in 1:numLscores) {
int comp = lComp[i];
int climber = lClimber[i];
lscore[i] ~ normal(al0[comp] + al1[comp] * ls[climber], sigmaL[comp]);
}

ls ~ normal(0, 1);
sigmaL ~ exponential(hypersigma_LScore);
al1 ~ normal(1, 1);
}

We can now feed the data (which we pre-processed in a boring way that mostly reflects how badly the data was prepared to begin with) and run a fit:

leadSM = pystan.StanModel(model_code=lead_hier)
lead_data = {
'numLscores': len(lead_pd),
'maxN': nClimbers,
'maxML': nLeadComps,
'lscore': lead_pd['qual'],
'lClimber': lead_pd['ClimberNum'],
'lComp': lead_pd['CompNum'],
'hypersigma_LScore': 1,
}
fit = leadSM.sampling(data=lead_data, iter=1000, chains=4)
print(fit)

And we get some lovely results:

There’s a nice spread of lead climbing skill levels, and they’re reasonably consistent from sample to sample, even though they’re barely constrained…
…lead skill positively correlates to lead scores…
…and the error term for the score regression is sensible for variation in lead scores, based on our prior knowledge of the sport.

We can do the same things for bouldering and speed climbing, and we get similarly good results. These have fun differences from the lead dataset: each bouldering score contains two scores: tops (boulder problems finished) and zones (boulder problems partially finished). There are also attempt counts for tops and zones, but the author of our dataset somehow makes the unfortunate mistake of string-concatenating integers, which makes that column unusable. We simply add two more variables:

Those results are similarly successful in finding appropriate skill levels and skill-score mappings.

Speed climbing differs from lead data since better climbers have lower times; we simply adjust our prior:

Our values of a1 are accordingly negative once we finish sampling.

Coming soon: part 2, where we actually try to correlate skills between the two disciplines. Ray will do some more analysis on top of my model and we’ll figure out whether speed climbing is actually correlated with lead and/or boulder skills.

--

--