Elo sucks — better multiplayer rating systems for smaller games

raysplaceinspace
5 min readMar 19, 2019

--

My game, acolytefight.io is a multiplayer skillshot arena. It might look deceptively simple, but it is designed to be a high-skill game. Skilled players learn the exact timings, speeds, distances and behaviours of every spell, and learn to predict and dodge their enemies to be the last one standing. Naturally, a game like this needs a rating system. Everyone wants to know, who is #1?

Most rating systems are based on the Elo rating system, which was originally designed for Chess. There are many variations — TrueSkill is the most well-known. DOTA 2 and League of Legends all of systems based on Elo. When I applied Elo to Acolyte Fight, everyone hated it. Every few days, a new person would join the Discord and say the rating system was trash. It took many months, tens of millions of games, and 4 remakes of the rating system to understand why.

How Elo works

The win/loss curves of any Elo system (including all variations such as TrueSkill), look something like this:

Elo rating system win/loss curves

This graph shows that, if you play against someone who is 500 points below you (perhaps you are a well-seasoned player and they are a newbie):

  • You will gain +0.5 if you win, or
  • you will lose -9.5 points if you lose

Elo recognises that you are the better player, and that out of 20 games, you will win 19 and they will win 1. So to maintain equilibrium, Elo makes you lose 19x more than you win. Equilibrium allows the rating system to only measure skill, and not the number of games played.

In Elo terms, another way of expressing this 19:1 win/loss ratio is to say you have a 95% win probability.

The problem: Relative win probabilities are unrealistic

In Acolyte Fight, there were many iterations on the rating system. People made a lot of (mathematically-unsound) suggestions. It took a long time to identify the root cause.

What is the root cause? Elo expects your win probability to follow an exponential curve, like this:

This says:

  • If you play against someone who is slightly below you (200 point difference), you’ll win 76% of the time.
  • If you play against someone who is quite a lot below you (400 point difference), you’ll win 91% of the time.

The problem is that, the actual win rate curve for Acolyte Fight doesn’t actually look like that. These are the actual win rates over 100000 games:

Notice, the actual curve here looks nothing like the graph above. There isn’t really an exponential curve. It is more linear. This was the insight that made me realise I had to change the rating system. If we take a top-level player, and make them fight a high-level, mid-level and low-level player repeatedly until we can become statistically confident of their win rates against each, there is no reason why their win rates would fit an exponential curve. Why do we even use an exponential curve? Who decided that? What data did they base this on?

Why does this not affect bigger games like DOTA 2?

I presume games like DOTA 2 still have an exponential curve in their equations, despite the fact that it is probably wrong. The reason it does not affect them is their matchmaker only puts players of similar skill together. Unlike my smaller game, they rarely need to accurately assess how a high-level player would perform against players of other skill levels. If they ever did though, I am sure it would function incorrectly.

The answer: The Aco rating system

I have designed a new rating system called the Aco rating system, which is similar to Elo except it fixes a few key problems.

  1. Actual win rates: The win probability is calculated from the actual data of the past 100000 games. This means it does not need to fit an exponential curve. For example, the system could lookup its database and see that a matchup of a 1800 rating player vs a 1300 rating player results in the higher-level player winning 76.3% of the time. If the high-level player is outperforming this, they gain points over time, and that is a fair system, based in actual data.
  2. Newbie suppression: The points able to be gained/lost is reduced when you are playing someone substantially lower in rating than you. This makes people happier because they can’t lose as many points to newbies. Instead, they can only lose the most points to people similar to their skill level, which feels much more fair.
  3. Small increments: In general, you will gain or lose about 1 point per game. This means each game is insignificant in the scheme of things, and it doesn’t hurt to play every game in ranked mode. Competing systems like TrueSkill or Glicko say their advantage is that you can converge on your rating a lot faster, sometimes adding 50 or 100 points from a single game. I actually found this was a disadvantage. The slow rating increase of Aco means that if you reach the top of the leaderboard, you know for sure you have really earned it and that it is not just caused by uncertainty error in the rating system.
  4. Daily decay: Every day, a person’s rating decays by 5 points. This ensures everyone is encouraged to keep playing ranked and defend their title. To preserve the true rating unchanged, the decay is stored separately from the rating, caps at 100 and each game cancels out 1 point of decay. Previously, people would camp at the top leaderboard, simply not playing to maintain their position, and that was not any fun.

Response

The “rating system is trash” complaints have reduced dramatically, although they have no reached zero. Is there a perfect rating system out there? Probably not.

Conclusion

The Aco rating system allows people of different skill levels to compete fairly.

Elo only really works when you’re playing others of similar skill levels. It was invented at a time before we could data mine hundreds of thousands of games. I would expect that the win probability curve even for Chess is not actually exponential.

With the Aco rating system, even if you are not being matched with others of similar skill level, you are still competing with them statistically. This means it suits smaller games that cannot rely on a large player base and a matchmaking system.

Join thousands of players per day — play Acolyte Fight here: acolytefight.io

--

--