With the Dota2 International 2018 and its $24m prize pool just around the corner, I thought it would be fun to analyze some DotA2 games using Python, and write a quick post on how others can also use Python to analyze professional DotA matches.
Specifically, I’ll take a look at which heroes do the best and the worst as enemies of Team Liquid’s Io (aka Wisp), played by GH.
If you just want to jump to the conclusions, here they are. In the 387 games since Miracle joined in September 2016 to form the current roster:
- Liquid won 258 out of the 387 games (66%) analyzed.
- Io was banned in 239 (61%) of the games.
- In the 47 of the games in which Io was picked, Liquid won 34 of those games (72%).
- The hero with the most success along side Io has been Rubick, who has won 8 out of 8 games. This is somewhat surprising to me and I’d be curious to hear why the synergy on these two is so strong. (match IDs: [3716722821, 3711037338, 3593368959, 3022691596, 3021494571, 2985537145, 2940685387, 2805082605])
- Necrophos, Clockwerk, Lone Druid are all undefeated with at least 3 games. Tiny, the classic Io ally, has a respectable 75% win rate with Io (12 out of 16).
- Tidehunter has the worst win rate, losing 3 out of 4 matches played with Io.
- As opponent picks, Tiny has the best record, going 3 for 3 against Io Presumably one angle here is opponents take Tiny to take away a classic Io ally from Liquid. (match IDs: 3796359654, 3796268587, 3766915400]).
- Crystal Maiden and Omniknight both have the worst record facing Io, each going 1 for 7 (14%).
- You can check out the full analysis file I generated here.
How to scrape and analyze pro games with Python
If you want to download in-depth data about DotA 2 games, the only place you need is the Open Dota API. This might surprise you if you Google for “dota 2 python”, as you’ll likely end up at the DotA2 Python client, which is not what you want. The issue is that library uses Valve’s API, which provides very minimal data about matches and their replays. Open Dota actually parses the replay files to pull out all sorts of interesting information, such as the results of team fights and gold advantages at various points in the game. It also organized professional tournament matches neatly into their own category.
So the main reference you’ll want to use is the Open Dota API docs, which is where we start.
The first endpoint I started with is the
/heroes endpoint, which maps hero IDs to the names used in the games.
curl https://api.opendota.com/api/heroes > heroes.json
That downloads the mapping into a file called
heroes.json which I save for further use. If you want to use Python, you can use the requests library, like we’ll do to download the team matches.
The Team Liquid team ID can be found by clicking to the teams tab on the main Open Dota website and clicking Team Liquid. Once I knew it was 2163, I looked at the API docs and saw how to download all of Team Liquid matches.
curl http://api.opendota.com/teams/2163/matches/ > liquid_matches.json
At this point, we’re still not done loading the matches, since the team matches endpoint only provides match data with a high level summary, and not with the in-depth data from the parsed replay file. We have to individually call the
/match endpoint with the match IDs received above. Since I need to automate substituting the match ID in for every match, I used Python and the requests library. I also do a sleep to avoid rate limiting (note that you can avoid rate limiting by signing up for an API key from Open Dota, here I just accepted the rate limiting).
As you can see above, I write the full match data as the JSON object and its newline into the
liquid_match_data.txt file. I didn’t use the .json appendix since it was a line by line of JSON objects rather than a full JSON list.
With that file containing all the match data, I can write some code to load data and do interesting things with it. Since I wanted to analyze picks and bans, I created a Match class that contains lists for each of the picks and bans for each of the teams. I also use my
heroes.json file I created earlier to map the hero IDs to their in game names.
Now with the matches processed, I can immediately calculate some high level statistics.
By iterating through my matches and building a mapping of heroes to wins, and total games played, I can calculate win percentages and then sort by them. Here is an example of me doing for opponent picks:
For Io Allies, I did roughly the same thing.
If you got this far, thanks for reading, and message me @waprin_io or waprin on Reddit with any questions, ideas for followup posts.