Matchmaking Ruins Everything

SBMM flattens your population distribution

Charlie Olson
Invokation Games

--

Imagine a 1v1 game (like chess), four players of equal skill, and initial MMR values of 3.

Everyone in this example has the same underlying skill, therefore everyone should have similar MMR values. The ideal MMR histogram would essentially be a single column at 3 (with some oscillations):

ideal MMR histogram of 4 players of equal skill

This is indeed what happens in an Elo-like MMR system as long as matchmaking is random. However, once skill-based matchmaking (SBMM) is added to the mix, it all goes to shit.

MMR histogram of 4 players of equal skill with perfect SBMM

ASCII Example

With perfect SBMM, the MMR distribution will become uniform over time — exactly the opposite of what it should be. The “correct” MMR distribution in this case should be perfectly tall and narrow (constant), but SBMM leads to perfectly short and wide (uniform).

This scenario is simple enough that we can step through it by hand:

  • All four players start with MMR = 3
  • Two matches are made at random
  • Winners gain 1 MMR; Losers lose 1 MMR
  • Repeat — only selecting perfect MMR matches
  • After the 3rd iteration, the distribution will be uniform, and no matches can be made

The process as an ASCII illustration:

state 0         state 1         state 2         state 3

A
B
C A B C
D C D A B D A B C D
---+--- ---+--- ---+--- ---+---
0123456 0123456 0123456 0123456

A v B A v C C v B
A loses A loses B loses
B wins C wins C wins

C v D B v D
C loses B loses
D wins D wins

More Realistic Example

Perfect SBMM breaks Elo in theory — Elo’s tuning parameters never even come into play if there is no difference in MMR. But SBMM in practice is rarely perfect. So what happens to an Elo system under more realistic circumstances? Will SBMM still have a partial flattening effect?

I wrote a simulation to find out:

Elo MMR Distribution — with and without SBMM

As expected, SBMM dramatically flattens the MMR distribution, just not all the way to uniform. Which is still bad. In fact, Elo under SBMM will continue expanding indefinitely in the absence of hacks to constrain it, but the expansion rate depends on the “tightness” of the SBMM.

This isn’t just a problem in simulation, it’s probably one of the main issues with the International Chess Federation’s Elo system:

In the past decade, certain innovations have caused rating deflation, a concern that has been raised by professional players and mathematicians and did not go unnoticed by FIDE. Players’ ratings are spread out too widely, and the situation is deteriorating with each passing year. — https://fide.com/news/2538

*There are other inherent problems with Elo though, e.g. the fundamental conservation of MMR is actually a flaw, since the distribution of skill is unlikely to be symmetrical, and clamping the low end while reducing the k-factor at high skills makes the distribution shift left as new player skills improve over time (TrueSkill has a bigger problem with this though).

The Value of Simulation

A simulator allows us to compare different systems using the exact same virtual players under different conditions. The simulator is a simplified model of reality, so if an MMR algorithm doesn’t work here, it would be unreasonable to expect it to magically work in the more complicated real-world.

True story: most game developers only run simulations on historical match data. This fails to capture the feedback effect of SBMM.

AB tests on live players are also necessary for validation, but are too slow and difficult to be useful for iteration during algorithm development.

In a nutshell, a robust simulator is necessary.

TrueSkill

In video games, Microsoft TrueSkill is the mostly widely recognized MMR algorithm. Is TrueSkill more invariant under SBMM than Elo? Let’s see:

TrueSkill MMR Distribution — with and without SBMM

TrueSkill also suffers from MMR expansion, but not quite as bad as Elo.

Side note: TrueSkill’s improvement in stability comes at a cost. The distribution expands less because TrueSkill decreases the step size of MMR updates over time. This however makes it ripe for smurfing, and problematic for player-facing MMR.

Smurfing side note: TrueSkill is heavily biased by your initial matches. If you deliberately play badly for a few dozen matches to start, you can guarantee yourself easy matches for a long time (the flipside of this is the reason why skilled players often have sub-50% win rates in TrueSkill). Similarly in a Glicko system, you can exploit the dynamic variance to farm MMR.

IVK-Casual

This has been a lot of doom and gloom so far. Is it even possible to have stable MMR with SBMM? The short answer is yes.

(This is sort of a sales pitch)

At Invokation Games we have a class of algorithms we call IVK (it’s an esoteric acronym: Ideal, Variance, K-factor), with invariant distributions under the entire range of matchmaking conditions. Here’s one example:

IVK-Casual MMR distribution — with and without SBMM

Ok, this isn’t perfect perfect, but it’s pretty darn close.

IVK-Ranked

IVK MMR distributions don’t only have to be symmetrical. Here’s an example of an asymmetrical, positively-unbounded Ranked distribution — simulated with SBMM-only, since that’s how Ranked modes work:

IVK-Ranked MMR distribution — with SBMM

As far as I know, IVK is the only generalized MMR algorithm with a configurable, deterministic, long-term MMR distribution.

Implications

The main issue with Elo, TrueSkill, and conventional MMR systems is that they require constant maintenance and tuning. Players aren’t fond of the workaround solutions (e.g. “hidden MMR” or massive Elo remappings), and data scientists aren’t cheap.

The root problem is that these MMR systems have unpredictable distributions. They’re not invariant under different SBMM conditions.

MMR distributions might be consistent from one season to the next — but they’re not analytically predictable or controllable. With the launch of a new game, a new matchmaker, or a major change in the meta, everything becomes uncertain again.

IVK solves the root problem. IVK delivers the spirit of Elo: intuitive player-facing MMR updates and simple configuration options — without the fatal flaw of uncontrolled, unpredictable expansion and/or sliding.

IVK also works for any number of players or teams, with or without placement matches, and for any combination of personal or team performance.

Note: Team-balancing

While perfect SBMM is unlikely, perfectly balanced teams can be relatively common in multiplayer games. If MMR is updated based on the team outcome, perfect team-balancing has the same effect as perfect MMR. In other words, perfect team-balancing will rapidly expand an Elo system, and no tuning of the variance or k-factor can fix that.

For systems that use the team outcome (win/loss) to drive MMR updates, it makes sense then to either deliberately unbalance the teams, or switch to IVK (obviously).

--

--