Introducing Club Ultimate Elo Ratings (v 1.0 — Men’s)
Applying the Elo rating system to club Men’s ultimate teams from 2004-Present

Introducing Elo Ratings for Club Ultimate
Taking inspiration from FiveThirtyEight and their Elo ratings for various sports, I modified the rating system for ultimate and applied it to club Men’s teams going back to 2004. The Elo system was initially developed for rating chess players but in recent years has been used for rating teams in many other sports.
I’ll say up front that I’ve only done these ratings for the Men’s division. That’s where my experience is and I relied on this to match my intuition to the rating system. I’m planning to expand to the Women’s division. Hoping to get feedback on these initial ratings and tweak accordingly. Also would help if the USAU website worked consistently to help source historical scores.
Here are the basics of how the Elo rating system works and how I applied it to club ultimate. Much more detail included for those interested at the end of this article.
- The ratings depend only on the final score of a game. Extra weight is given to Nationals games and less weight is given to placement games at all tournaments.
- Unlike USAU ratings, teams always gain Elo points if they win a game and always lose Elo points if they lose a game. Teams gain more points for upset wins and gain more points based on the margin of victory.
- Every game is zero sum. Elo points are traded between the two opponents based on the outcome of the game. Individual games do not affect the Elo rating of teams not included in a particular game.
- Ratings carry over from year to year although a team’s end of season ranking is adjusted closer to the average rating of 1500 before the start of the next season.
- Only teams that have made Nationals at any point going back to 2004 are included in the ratings system.
Current Ratings
So where do the current teams stack up after the US Open? Revolver still holds onto the #1 spot over Sockeye. Some clear tiers show up after that. Data updated 8/7/17.

Link to Teams’ Historical Charts
So how does each team’s rating change dating back to 2004? I’ve plotted every team’s rating over time. CHECK IT OUT.
Understanding a Team’s Elo Rating
I set up the rating system so that an “average” team has a rating of 1500. Keep in mind that only teams that have made Nationals at some point going back to 2004 are included in the ratings. Each season includes games with ~25 teams. So you can think of a 1500 rating as being an average Top 25 team in the club Men’s division.

Pros/Cons of Elo System for Ultimate
The Elo system is not a perfect measure by any means. Here are some things to consider when taking in the ratings.
Pros
- Simple and intuitive changes to a team’s Elo rating. Every win increases your rating. Does not rely on past games or past opponents. Therefore, past ratings are not impacted by future games.
- Gives higher weights to Nationals games and gives lower weights to placement games. Losing in the 9th place finals game at a tournament doesn’t impact a teams’ rating as much as a loss in the semis of a tournament.
- Provides a probability for each team to win a given game based on their rating and their opponent’s rating.
- Allows for historical comparisons of teams and specific games.
Cons
- Leaves teams out of the ratings. The Elo system for sports works best when you have consistency of teams playing in a given division from year to year. As new teams sprout up, it will take time for their rating to adjust to their skill level. Based on the high-level of turnover in ultimate, I chose to only include teams that have made Nationals at some point dating back to 2004. I think the rating system would be less predictive if many more teams were added in to or out of the system. This also provides the challenge of if a team makes Nationals, do all of their past results get added to the system? Also when should teams drop out of the ratings? For instance, should I keep SD-Streetgang in the ratings? This is part of the reason I’m naming this my 1.0 version of the rating system. It may become more intuitive moving forward with the USAU TCT.
- May put too much weight on wins versus points scored. If you’re a big underdog and lose 15–14 to a team like Revolver, your Elo rating decreases slightly. If you win 15–14, you would see a huge bump in your team’s rating. That’s a big swing for one point and may not be indicative of a team’s true skill level. Despite this sensitivity, I really like the fact that Elo does not reward a team for losing a close game. A loss is still a loss.
- The Elo system is not a good rating system for allocating bids for Nationals within a given season. Since the rating system relies on the past seasons’ performance, it is not a fair rating system to measure a single season team’s performance.
- Difficult choices in determining which club teams should be combined into single franchises for the ratings. See further discussion on the topic below. Also, legacy teams that made Nationals well in the past can stay in the ratings even though they may really drop in competitiveness.
- Variable amount of games per team per season. Some teams in the rating system are only playing other teams in the ratings system several times per year. This makes it difficult to adjust their rating to their true skill level. This is a downside of limiting the number of teams in the rating system.
What’s under the hood?
From this point, I’ll dig into more of the nuts and bolts of the rating system and how I adapted it for ultimate. May get a bit technical but I’m trying to describe it as clearly as possible.
Data
All data come from USAU/UPA Score Reporter. I had to use other sources and intuition to fill in certain missing data points. The historical data especially 10+ years ago are not perfect but good enough for this analysis. Also, big thanks to Nate for the ultimaterankings.net website and making 2014–2016 games really easy to access.
The number of games in the system is less than the number of games these teams played each season. The reason is that only games between two teams in the system are included. This means only games between teams that have made Nationals at some point between 2004–2017 are included.
Formula: Team’s Probability of Winning a Game
- Pr(A) = 1 / (10 ^(-ELODIFF/400) + 1)
- Example: SF-Revolver (Elo=1821) would have a 87.1% probability of beating CHI-Machine (Elo=1489) if they played today according to the ratings.
Formula: Adjusting Team’s Rating
- [ (1 (Win) or 0 (Loss) ) — (Pr(A)) ] * K-Factor * MOV-Factor
- Example: If SF-Revolver beat CHI-Machine, their Elo rating would increase from 1821 to [1821 + (0.129)*K-Factor*MOV-Factor].
- If SF-Revolver lost to CHI-Machine, their Elo rating would decrease from 1821 to [1821 + (-0.871)*K-Factor*MOV-Factor]
K-Factor & Types of Games
The K-Factor determines how fast a team’s Elo rating reacts to the outcome of a game. With a higher K-Factor, a team’s rating will jump around a lot and be indicative more of recent performance. A low K-Factor will have more steady ratings relying more on historical performance.
Another important caveat with ultimate compared to sports like the NBA or MLB is that each game does not have the same significance. Playing in the semis at the US Open is not the same as playing for 9th place at the US Open. This is similar to how Elo is used in World Football Elo Ratings.

For ultimate, I used my intuition to consider how important each tournament and game type is relative to standard regular season games. I decided that Nationals games should be weighted 1.5x, placement games 0.5x, and Nationals placement games 0.2x.
Once I assumed these weights, I determined that the K-Factors below provide the best results for ultimate ratings. I sought to minimize the RMSE for non-placement games to determine these factors.

MOV-Factor (Margin of Victory)
In addition to the K-Factor, a team’s rating will increase by a higher amount if they win a game by a higher margin. In addition, the ratings expect a favored team to win by more points, so a higher margin victory makes a larger impact for an underdog than a favorite. I started with the MOV-Factor developed for the NFL by FiveThirtyEight and then tweaked it for ultimate.
- MOV-Factor = LN(ABS(PointDiff)+2) * (2.5 / ((ELOW-ELOL)*0.001 + 1.7))
- ELOW is the Elo rating of the winning team before the game. ELOL is the Elo rating of the losing team before the game.
- PointDiff is the point differential of the game scaled to a winning score of 15. So an 11–9 game will have a point differential of 2.7, and a final game of score of 17–15 will have a point differential of 1.8.
- The natural log component of the formula means that there are diminishing returns to higher margins of victory. Increasing a win from 15–13 to 15–12 will have a higher impact than increasing a win from 15–8 to 15–7. Also, note that a 15–8 victory has twice the MOV-Factor as a 15–14 victory.
- Another note is that the MOV formula was adjusted so that a team’s Elo rating before a game should be predictive of its rating after the game. In simpler terms and using the example above, if SF-Revolver (Elo=1821) and CHI-Machine (Elo=1489) played a game, their expected Elo ratings after the game should still be 1821 and 1489 respectively. This is because SF-Revolver will win more often than not but their rating will increase only slightly after a win. Their rating will decrease much more any time they lose to CHI-Machine.
Year to Year Carryover
From season to season a team’s composition and ability will adjust. Instead of resetting every team each year, a team will retain 85% of its rating above/below the average rating of 1500. This 85% factor was determined like the K-Factor and MOV-Factor to improve the predictability of the rating system.
- Example: BOS-Ironside finished the 2016 season with a rating of 1895 (and a title!). Here is how we calculate their start of the 2017 season rating.
- (0.15 * 1500) + (0.85 * 1895) = 1836
Note that this 85% factor is much higher in ultimate compared to the factor used by FiveThirtyEight for the NBA and NFL. This makes sense as the top teams tend to stay really good consistently in ultimate compared to pro leagues where there is more turnover of rosters and team ability.
New Team Entrance Rating
To achieve an average Elo rating of 1500 for teams included in the system each season, we need to tweak the starting rating that each team has when they enter the system. The challenge is that typically weaker teams are the ones that drop out of the system leaving teams with stronger ratings and increasing the average. In addition, it makes sense that a new club should start with a lower rating when they join the system.
I determined that a franchise starting rating of 1375 leads to the average Elo rating of teams included in the system to be ~1500 from 2013–2017.
Note that all franchises start their existence in the system at 1375. It doesn’t matter if you are SF-Revovler or TX-HIP.
Teams & Franchises
These were the trickiest decisions to be made for the rating system. Dating back to 2004, club teams have changed their names in a given city/area. To account for this, my goal was to develop “franchises” to keep consistency in team ratings from year to year for a given city/area. Here are the rules of thumb I used. Judgement was needed for some of the decisions. Really there are no right answers here.
- One franchise for each city/area’s top club team through the years. This is why DoG is included as the same franchise as Ironside. Exception is the bay area where Jam and Revolver both were top club teams at the same time for a few years.
- If a city/area goes without a team for 2+ years, it resets the franchise. This is why MI-BAT and MI-High Five are different franchises but PHI-Southpaw and PHI-Patrol are the same.
Here are the franchises that include multiple club teams over their history back to 2004.

Other Notes
- Early Seasons (2004–2005): Since every team starts with a 1375 rating, it takes a few seasons before teams separate to their “true” ratings. Therefore, it is hard to make conclusions about the rating system when looking back at 2004–2005 seasons.
- International Teams/Tournaments: Not included. All games included are only between teams included in the ratings so just US and Canadian teams. No international tournaments such as Worlds included.
- Comparison to Ultiworld Elo article: Cody Mills at Ultiworld wrote an article last year using the Elo rating system for the college division. I think it got some bad feedback based on the application of the Elo system. I don’t think it makes sense to apply this system for one season especially at the college level where there is less connectivity of teams that play each other. I thought it was cool to see Cody’s work and the ouput of the rating system. Unfortunately, I think it will be hard to do an Elo system for the college division. Seems more appropriate for club.
- W/L Games: A lot of games 10+ years ago were only recorded as “W-L” on Score Reporter. I inputted these games as having a 15–13 score.
- Other rating systems: There have been discussions to use True Skill Through Time and Glicko rating systems to ultimate. I think these would be great too. Just throwing out Elo since it’s pretty straightforward. Not saying it’s the best true measure of a team’s ability at any point in time.
- “Version 1.0?”: I view this as a Beta version of the rating system. I am sure that I missed some caveats on teams/franchises or some other part of the rating system. I’m hoping to update going forward appropriately.
- Want more?: Let me know if you’d like to look more at the raw data or any other questions you have. craig (dot) poeppelman (at) gmail (dot) com

