How Do Rankings Work
Rankings are one of the least-well understood and most influential parts of WFTDA. We talk about whether a team is over or under-ranked, how a tournament is going to change a team’s standing, or who might be in contention for the division playoffs or championships. And yet, despite rankings governing scheduling, seeding, and the length of the season, most skaters I’ve talked to have very little understanding of how those rankings are established.
Strap in, this is going to get crunchy.
Every team has an average ranking points score. Every month (or so), WFTDA sorts all teams by this average score and publishes that list as the rankings. The top 36 are Division 1 and the next 16 are Division 2. (Technically, some number of additional teams are also in Division 2; since division is literally only used to determine playoffs, this is an irrelevant point.)
The average is composed of the game scores for each of that team’s games. When we played Steel City earlier this year, we scored 473.52 game points. When we played 2x4 at the Big O, we scored only 290.35. These numbers get added together and then averaged.
This year introduced the notion of decay. That is, games which occurred more than six months ago are handled differently, and for this I’m going to give a spreadsheet example.
The average is just a sum of all the scores divided by a sum of all the weights. Games older than six months have both game points and weight halved. This puts emphasis on more recently played games. It’s smart.
The next question to ask is ‘where does game score come from’? The answer is based on how well you did in the game and how tough the team is, multiplied by some numbers to make the scores easier to deal with.
The formula is 300*percentage of total points scored by both sides in the game*strength of the opposing team. I’ll give another example.
You may notice that the game scores in this grid are not identical to the official game scores, but that’s because WFTDA tracks more precision than it states. That difference tends to be less than a point per game, and when you look at the average, the difference is less than a hundredth of a point. (Why WFTDA tracks this extra precision is mysterious — it’s bad and should change.)
Okay. So now we understand how game points are calculated, how those lead to an average ranking score, and how those scores are used to rank teams. The one piece we haven’t covered is strength factor. This is the number from which we derive how many game points you earn in a game and it’s really important. When we say a team is overranked or underranked, what we really are talking about is whether their team matches the strength factor that they’ve been assigned.
To figure out strength factors, we take the rankings list, and find the one in the middle. The average score for that team is the median score.
For the May rankings, there are 325 teams posted on the web-site, but in truth, 334 teams exist in the system; nine teams used in determining the median are not being reported. Teams without enough rankings games are used to determine median but not reported. (This is also bad and should change.) The median team in May 2016 was #167 Go-Go Gent Roller Derby, and the median score was 141.57.
To find your strength factor, you simply take your team’s average ranking and divide it by the median score. So, Boulder County’s strength factor is 320.57 / 141.57, or 2.26. The minimum strength factor is half a point; anything less gets elevated to 0.50.
I’m a big fan of WFTDA’s ranking system. It predicts not only the outcome of a game but how big the difference in the scores will be. It allows us to set goals against opponents we know we aren’t going to beat. It allows teams which are geographically close but who aren’t matched in skill to play with the comfort that such a game is not a waste of time.
One issue I have with the system is that parts of the system are unverifiable. If a system cannot be verified then people doubt its results even when the results are correct. (Full disclosure, I’m convinced things are correct. Huge props to Acid who helped explain to me what I was seeing and why things were okay.)
This means, for strength factors and game scores, we should not be tracking precision we don’t report. Especially given how little it impacts, keeping that precision only makes things confusing. Additionally, for the rankings list, we should report all the teams that are used to calculate the median, and simply note which ones are not eligible for playoff/championship status. Changing those things would make the system verifiable and the nice people who’ve put up with e-mails from people every year questioning the rankings get to do stuff that actually matters instead of handling people like me.
Secondly, game mismatches can produce weird results. Look at Gotham vs Windy City at the 2016 D1 Playoffs. The score was 491 Gotham / 13 Windy — Gotham made 97.4% of the game points in this game. However, Windy’s strength factor was only 3.26 at the time, which meant Gotham scored only 952.57 points for that game, underneath their average score. Not that Windy did any better — they only made 56ish game points for themselves. This is what happens when a team with a strength factor of 7.3 plays a team with a strength factor of 3.3. Nobody wins.
At the top-end, the D1 and D2 environments are heavily influenced by tournaments. Every single D1 team and the vast majority of D2 play at least half of their games at tournaments, where they encounter teams with strength factors more than a full point off of their own. Low seeds and high seeds play each other — a bracketing practice designed to produce an exciting championship game, but which is decidedly hard on the WFTDA rankings.
On the bottom-end, it can be difficult for low-ranked teams to gain rankings points quickly even when they’re very good. International teams have this problem consistently. The strength factor challenge program is supposed to help with this, but I have only heard about one such bout this year, and that one definitely bent the rules to correct systemic issues.
On the whole, the system works pretty well and understanding it thoroughly can help in all aspects of planning a season. The discussion about rankings can be much deeper than the mechanics themselves. There’s more to say about rosters, play-styles, reffing environments, floor materials, season, weather, and more.
Several people have developed tools to reason about rankings in a more efficient way. For example, the Head-2-head-o-matic can be an especially helpful tool, and Flat Track Stats hardly needs an introduction. If you find others, I’d be thrilled if you sent me a link! There’s never enough information in derby.