Elo sucks — better multiplayer rating systems for smaller games

raysplaceinspace
Mar 19, 2019 · 5 min read

My game, acolytefight.io is a multiplayer skillshot arena. It might look deceptively simple, but it is designed to be a high-skill game. Skilled players learn the exact timings, speeds, distances and behaviours of every spell, and learn to predict and dodge their enemies to be the last one standing. Naturally, a game like this needs a rating system. Everyone wants to know, who is #1?

Most rating systems are based on the Elo rating system, which was originally designed for Chess. There are many variations — TrueSkill is the most well-known. DOTA 2 and League of Legends all of systems based on Elo. When I applied Elo to Acolyte Fight, everyone hated it. Every few days, a new person would join the Discord and say the rating system was trash. It took many months, tens of millions of games, and 4 remakes of the rating system to understand why.

How Elo works

The win/loss curves of any Elo system (including all variations such as TrueSkill), look something like this:

Elo rating system win/loss curves

This graph shows that, if you play against someone who is 500 points below you (perhaps you are a well-seasoned player and they are a newbie):

  • You will gain +0.5 if you win, or

Elo recognises that you are the better player, and that out of 20 games, you will win 19 and they will win 1. So to maintain equilibrium, Elo makes you lose 19x more than you win. Equilibrium allows the rating system to only measure skill, and not the number of games played.

In Elo terms, another way of expressing this 19:1 win/loss ratio is to say you have a 95% win probability.

The problem: Relative win probabilities are unrealistic

In Acolyte Fight, there were many iterations on the rating system. People made a lot of (mathematically-unsound) suggestions. It took a long time to identify the root cause.

What is the root cause? Elo expects your win probability to follow an exponential curve, like this:

This says:

  • If you play against someone who is slightly below you (200 point difference), you’ll win 76% of the time.

The problem is that, the actual win rate curve for Acolyte Fight doesn’t actually look like that. These are the actual win rates over 100000 games:

Notice, the actual curve here looks nothing like the graph above. There isn’t really an exponential curve. It is more linear. This was the insight that made me realise I had to change the rating system. If we take a top-level player, and make them fight a high-level, mid-level and low-level player repeatedly until we can become statistically confident of their win rates against each, there is no reason why their win rates would fit an exponential curve. Why do we even use an exponential curve? Who decided that? What data did they base this on?

Why does this not affect bigger games like DOTA 2?

I presume games like DOTA 2 still have an exponential curve in their equations, despite the fact that it is probably wrong. The reason it does not affect them is their matchmaker only puts players of similar skill together. Unlike my smaller game, they rarely need to accurately assess how a high-level player would perform against players of other skill levels. If they ever did though, I am sure it would function incorrectly.

The answer: The Aco rating system

I have designed a new rating system called the Aco rating system, which is similar to Elo except it fixes a few key problems.

  1. Actual win rates: The win probability is calculated from the actual data of the past 100000 games. This means it does not need to fit an exponential curve. For example, the system could lookup its database and see that a matchup of a 1800 rating player vs a 1300 rating player results in the higher-level player winning 76.3% of the time. If the high-level player is outperforming this, they gain points over time, and that is a fair system, based in actual data.

Response

The “rating system is trash” complaints have reduced dramatically, although they have no reached zero. Is there a perfect rating system out there? Probably not.

Conclusion

The Aco rating system allows people of different skill levels to compete fairly.

Elo only really works when you’re playing others of similar skill levels. It was invented at a time before we could data mine hundreds of thousands of games. I would expect that the win probability curve even for Chess is not actually exponential.

With the Aco rating system, even if you are not being matched with others of similar skill level, you are still competing with them statistically. This means it suits smaller games that cannot rely on a large player base and a matchmaking system.

Join thousands of players per day — play Acolyte Fight here: acolytefight.io

acolytefight

Acolyte Fight!

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store