On why the Boston Major format sucks…

Ben Steenhuisen
datdota
Published in
7 min readDec 2, 2016

But how it might be fun anyway.

As a company, Valve operates very very differently to pretty much any company you’ve ever worked with or interacted with. Their fundamental approach is to view every external (and many internal) events as experiments — with various outcomes mapping to various degrees of success or failure. Their view of feedback is inherently data-driven: how many users joined in the latest product feature, which type of abuse is the most prevalent, how many compendiums sold. There are a lot more esoteric statistics and analytics they run, but I digress.

This situation leads to them actually force themselves to push and grow in (mostly) the right direction, but also inspires a newfound type of curiousness — a willingness to try out new things to see how they work. The International itself is a product of this curiosity, Erik Johnson explained in an interview that someone at Valve said “what happens if we host a tournament?” to which later the additional question was asked “what happens if it had a like, massive prize pool?”.

The Boston Major is an example of another set of these experiments for Valve in many respects: a smaller talent ensemble, time-frame, location (East Coast USA), and most notably for players, analysts and pundits alike — a completely different format: GSL Groups into Single Elimination.

Many people have likened the format to that of the CS:GO Majors, as if that acts as some underlying justification for the change. CS:GO and Dota 2 are two fundamentally different games, so even if we ignore the blaring flaws of the CS:GO Major format (two bo1 wins and you’re a “Legend” and guaranteed top 8) and ignore the fact that in this Dota 2 Major format all 16 teams advance to the playoffs (not just 8 teams) — it’s not in itself a justification for the change, you are just pointing out a similarity and that’s as far as it goes.

GSL into Double Elimination as a format has actually been excellent at previous non-TI Dota majors. That said, 13 of the 16 teams didn’t want to see TI6 played with a GSL format, preferring for a longer format group stage (Two group bo2 round robin) and petitioned Valve as such. Valve acquiesced then, but not now. In a vacuum, GSL is better than Round Robin because:

  • all teams are in direct control of their own future
  • none of the games are ‘meaningless’
  • each game means exactly the same for both teams

The problem is that GSL forces you to divy up the 16 teams into 4 groups — and if you don’t do this properly then some groups are much easier than others (the 2nd best team in one group could be 1st in another, or 4th in another). Analysts predictions of a team’s true skill coming into a tournament have been shown to be very far from reality (see The Manila Major).

Another situation is that when you go from GSL into Single Elimination, even if you got each group perfectly correct and there were no upsets — you still run a 1/3 risk that your two best teams will fall on the same half of the elimination bracket — and hence meet in the Semi-finals not the finals. As evident in multiple Dota 2 tournaments, most recently ESL Frankfurt 2016, having a finals where there’s a mismatch leads to an ‘unhype’ retrospective opinion of the tournament as a whole.

Upsets also have a much more profound impact in a GSL format — to the point where a single group’s upset can impact very negatively on other teams who now must face a tougher-than-expected opponent earlier on. This is possible in a Double Elimination bracket — but it’s way more rare for it to impact many teams.

“But”, says the devils advocate, “surely if a team loses to a ‘worse’ team that that means they don’t deserve to win the game anyway?”. This is a rational and mostly reasonable question if not for the complex dynamic that we get in Dota. In the same way that Chess, or Football, or Boxing or countless other sports have — there are very interesting head-to-head rivalries that exist in Dota 2. Sometimes a complete underdog has a specific style of gameplay or strategy which always works against one team who are universally regarded as better. This isn’t a rare occurrence in Dota 2 — there’s lots of cases of transitive breakdowns between the top teams, and hence it’s normal to judge a team not just by their navigation of a Single Elimination Bracket (like, how Tennis has almost all of their tournaments), but rather against a wide set of different opponents.

So I wanted to explore how the tournament looks with GSL, and how different it’d look given a Round Robin format. To model this situation, I took the average top 16 team Elo distributions of the last year (not the teams as such, just the top ranked Elo score, second best Elo, etc) and put them into Monte Carlo simulations: one for GSL into Single Elimination; and one for Round Robin (bo1) into Single Elimination with the existing TI tiebreakers. Both Monte Carlo’s ran 10⁷ iterations.

First up, the GSL -> SE format.

╔═══════════╦════════╦════════╦════════╗
║ Team Rank ║ Winner ║ Top 2 ║ Top 4 ║
╠═══════════╬════════╬════════╬════════╣
║ 1 ║ 20.68% ║ 32.68% ║ 50.03% ║
║ 2 ║ 16.69% ║ 28.57% ║ 45.99% ║
║ 3 ║ 15.80% ║ 27.85% ║ 45.16% ║
║ 4 ║ 13.78% ║ 24.59% ║ 41.81% ║
║ 5 ║ 11.49% ║ 22.18% ║ 38.45% ║
║ 6 ║ 7.73% ║ 16.83% ║ 32.43% ║
║ 7 ║ 3.41% ║ 9.42% ║ 22.15% ║
║ 8 ║ 2.20% ║ 6.78% ║ 18.33% ║
║ 9 ║ 1.28% ║ 4.57% ║ 14.26% ║
║ 10 ║ 1.22% ║ 4.39% ║ 14.25% ║
║ 11 ║ 1.15% ║ 4.28% ║ 13.82% ║
║ 12 ║ 1.14% ║ 4.21% ║ 13.67% ║
║ 13 ║ 1.05% ║ 3.88% ║ 12.91% ║
║ 14 ║ 0.86% ║ 3.45% ║ 12.59% ║
║ 15 ║ 0.79% ║ 3.26% ║ 12.49% ║
║ 16 ║ 0.73% ║ 3.07% ║ 11.66% ║
╚═══════════╩════════╩════════╩════════╝

The best team ends up winning ~20.7% of the time which is a bit low (not alarming) given the team Elo distribution. What’s a lot more disconcerting is the 137:1 odds on the worst team: 50% higher than what pundits are even offering on them. This is even more exaggerated when looking at top 2 and top 4 odds offered: 152% higher EV on the worst team making it to the final than modeled.

In 68.85% of these simulations, at least one group had an upset, and in only 0.86% of the simulations were all the upsets “balanced” (as in all upsets are cancelled by an exactly identical upset in the paired group).

Next, the Round Robin -> SE format.

╔═══════════╦════════╦════════╦════════╗
║ Team Rank ║ Winner ║ Top 2 ║ Top 4 ║
╠═══════════╬════════╬════════╬════════╣
║ 1 ║ 21.79% ║ 35.15% ║ 53.98% ║
║ 2 ║ 16.95% ║ 29.39% ║ 48.75% ║
║ 3 ║ 16.23% ║ 29.19% ║ 48.29% ║
║ 4 ║ 14.24% ║ 25.92% ║ 44.21% ║
║ 5 ║ 11.88% ║ 23.46% ║ 42.45% ║
║ 6 ║ 7.68% ║ 17.04% ║ 34.65% ║
║ 7 ║ 3.00% ║ 8.50% ║ 21.51% ║
║ 8 ║ 1.87% ║ 6.03% ║ 17.18% ║
║ 9 ║ 1.04% ║ 3.88% ║ 13.07% ║
║ 10 ║ 0.98% ║ 3.81% ║ 12.90% ║
║ 11 ║ 0.95% ║ 3.57% ║ 11.95% ║
║ 12 ║ 0.87% ║ 3.43% ║ 11.82% ║
║ 13 ║ 0.84% ║ 3.29% ║ 11.13% ║
║ 14 ║ 0.61% ║ 2.68% ║ 10.09% ║
║ 15 ║ 0.53% ║ 2.37% ║ 9.12% ║
║ 16 ║ 0.53% ║ 2.29% ║ 8.89% ║
╚═══════════╩════════╩════════╩════════╝

Substantially lower chances of the truly worst teams making it to the finals (~1/200 for 16th place). Slightly higher chance of the top teams making it, with a salient point (see below) around 7th place (meaning if you were above this point — the GSL format is actually bad for you).

Shifts per rank in modelled % final outcome.

Single Elimination as a format is something I think can be really good — it’s hype, every match means a lot. But to make it truly reasonable, you need accurate seedings and GSL groups just aren’t good enough. Minor upsets can have far-reaching ripple effects throughout the tournament — disrupting the final placing, the global impression and the livelihood of the teams involved. A Round Robin stage (which is 56 games, versus an expected 50 games in GSL) or even a Swiss stage (6 round Swiss would be 48 games) would be way more accurate at seeding your Single Elimination bracket — and would lead to way closer matches that crescendo towards the finals.

It’s possible that the Boston Major goes off without a hitch — all the games are really good and we end up seeing the best teams in the finals duking it out. Even if that happens, it doesn’t automatically make this format ‘good’ or even defensible — even a broken clock is right twice a day.

Check me on twitter: https://twitter.com/followNoxville

--

--

Ben Steenhuisen
datdota

Dota 2 statsman and occasional caster | runs @datdota