Using Machine Learning to Predict the Winners of The International DOTA 2 2016 Tournament
To start getting into machine learning I started doing a small project — creating a script that would predict the brackets of The International. The International is the annual DOTA 2 hosted by Valve, and is the biggest tournament in terms of winnings in all of eSports. Due to crowdfunding, the current prize pool stands at $19,842,840, and is projected to reach more than $20,000,000 by the end of the tournament.
While doing research I came across Matt Harvey’s machine learning model for the 2016 NCAA March Madness Tournament. I decided to use his open-source script but apply it to DOTA 2 data.
Getting The Data
The first thing I needed to do was to get match data of the teams participating in the International. I reached out to Howard of yasp.co and he was able to direct me to the yasp database explorer, allowing me to get the match_id of all the pro DOTA 2 matches that they tracked.
Using these match_ids, I then used Valve’s own Steam Web API to retrieve data for each of the matches.
Transforming The Data
I decided to just use Matt’s script and change it as little as possible, so I needed to transform the JSON data into the CSV format that his script uses. To do this I just used python’s ijson libraries to transform the data.
Changing The Parameters
Matt’s parameter labels were for basketball data, so I had to change them to the appropriate DOTA 2 parameters. From the data that I had I identified several variables that I thought would be a good criteria for the predictions:
Score | Kills | Deaths | Assists | Last Hits | Denies | Hero Damage | Tower Damage | Hero Healing | XP per minute | Gold per minute
I was unsure about the applicability of some of the stats or even if they were accurate (a lot of matches for example had scores of 0 for both teams), but I looked through the predictions and they *generally* felt right, with OG for example being predicted to win a lot of the match ups.
And Now… The Predictions!
This is how I’ve filled up my Compendium brackets based on the results:
These bracket predictions are based on the match-up percentages below, assuming the teams with the higher percentages win their match ups:
I’ve uploaded the source code and some of the data I used over at:
I hope the code is useful for someone else and that there are other machine learning prediction attempts in the next few days. If anyone has an alternative algorithm and different predictions I would love to hear about them.
Note: Although my bracket results predict the TI6 trophy to go to OG, I’m still hoping for either TNC or Na’Vi to win. Go Na’Vi! #LabanTNC!