Monte Carlo Sims in Esports

Ben Steenhuisen
datdota
Published in
5 min readMay 19, 2022

I’ve been doing Monte Carlo Simulations (let’s just call them MCS from now on) in esports since 2014 (around The International 2014). They’ve always been a very powerful tool to quickly analyze more complex problems where there’s no simple closed-form expression, and they are also easy to adapt or extend for further exploration.

Classic starting point for MCS

A classical starting problem for MCS in computer science education is estimating π. First you take a unit square with corners at (0,0), (0,1), (1,1), and (1,0), and a circle with it’s center at the origin, and a radius of 1. You can then generate pairs of numbers (x,y) where 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, using a random number generator. If x² + y² ≤ 1, the point will lie inside the quarter-circle, else it will lie outside. Since you know the area of this quarter circle is π/4, and the area of the unit square is 1 — you know that the ratio of random points inside the circle to total points should be π/4 to 1. Some rearrangement of this ratio allows you to estimate pi.

Simple Python code, and sample output.

In the above example, you can see the sample output is close to π, but not exactly the number we’re used to. This follows from the law of large numbers (LLN) which states that the average result from a large number of trials should be close to the expected value from a single trial. There is some underlying error, which can hopefully reduce by increasing the number of trials (in the example it’s just 10⁶, a million — we could make it a billion).

Bringing it back to esports

Now, how does this all tie into esports? Well — we are able to use this technique to simulate something even more complex: an esports tournament, or circuit. All we need is:

  • some initial conditions (what teams are playing, current points, etc)
  • a pretty clear picture of the future structure and mechanics of the system (remaining events, formats, qualification process, etc)
  • some method to simulate individual games (in this case, ratings from https://www.datdota.com/ratings)

It’s okay if there is some uncertainty, as long as there are reasonable assumptions made. Right now for example, we have no idea which teams will win the Open Qualifiers for Division 2 (and hence be eligible for Regional Qualifiers), so we can just pick reasonable delegates for those slots. The knowledge of the system is a moving target, and as we get more information we are able to update the model accordingly — which should increase its overall accuracy.

Below is an example of the current state of the ESL Stockholm Major.

  • There are just 8 games remaining of the playoff.
  • bo_x is a helper function to simulate a best-of-X (default 3, but 5 for the grand finals)
  • load just allows looking up teams by their name, so it’s easy to input incomplete brackets/stages. Initially these were only used in the group declarations, but as results come in we can partially fill in the event.
  • points_distribution is pre-defined for the event based on DPC points.

There are similar methods for the other parts remaining in the Dota season — DPC leagues, the Arlington Major, regional qualifiers, and the wildcard qualifier. For each iteration of the simulation, we simulate the entire remaining season. At key points we can record metrics we find interesting, for example what the min/max/avg points are for the 12th qualifying team, or the final placement for a team at TI.

There’s no hard and fast rules on how to write MCS, but here are some guidelines I’ve picked up:

  • make your components very flexible and reusable: round robin groups, elimination brackets, tiebreaker rules, etc — with just a few of these you can make almost every tournament
  • optimize your simulation for speed: use the correct datastructures, remove redundancy, make sure you can easily strip logging
  • if you can easily do it, parallelize it. For esports MCS I generally don’t bother, but in other topics I often need to run much bigger simulations where the ability to run the calculation on multiple cores on multiple servers (for example FPL long-term optimizations would take a few months on a single thread on my home PC — far too long)

Handling some problems

There are a few assumptions made which could turn out to be plainly wrong: events could change formats, penalties could kick in, teams could disband, or perhaps Chinese teams won’t be able to make it to Arlington! These are adjusted with the best available defaults and are an acceptable type of error.

There is also implicit variance which could come from any ratings approach: a team like BetBoom has only played 38 professional matches so their rating uncertainty is significant. Another team could just also start playing much worse (or better) than their current rating.

The only topic worth still discussing are general sampling errors. On https://noxville.github.io/ti11-probabilities/ both PSG.LGD and TSM FTX have 100% chance to qualify via points after 100k simulations of the remaining DPC season. This can be verified a bit further by looking at the raw tracking data to ensure it’s not a rounding error and is in fact every iteration (which it is). That said — it’s still possible that these teams do not, in reality, qualify on their current points, simply because we’ve not explored every possible permutation (there are simply too many), nor have we even considered only unique outcomes. We have instead randomly created scenarios which are built upon individual events, some of which we’ve used weighted outcomes.

Anyways, hope this was a small primer explaining MCS and how they are a powerful tool in modelling complex (or simple) series of events in esports!

- Noxville

--

--

Ben Steenhuisen
datdota

Dota 2 statsman and occasional caster | runs @datdota