Framing The Problem Right

Frames are a concept we have for time-based snapshots of objects (normally players, but can be teams, or buildings) in a game. The old datdota had about 20 frame values per player, with 5 minute intervals; the current model persists 100 frame values every 1 minute. Users can see contextual information for 100 frame rows (either individual performances, or aggregated in multiple ways: by hero, by team average, by teams in a single match, etc) sorted ascending or descending.

Some of these values don’t make sense to display raw or as aggregates (for example current global building states) but are more useful as filters. Other values aren’t shown just because they haven’t been Q&A’d enough. For more complex queries the parser can spit out per-second JSON blobs, so I plan to keep a per-second sample around on disk for Spark processing on very ad-hoc queries (these blobs are around 80–120MB per match so the current patch is ~110GB on it’s own).

One of the things I’ve least wanted to do in the last few months is rewrite the Frames page to use the core API system. The code was written hastily but quite efficiently, and at the time I initially wrote it I was hoping that it would last for at least a few years. With the growing need for specific time-sensitive data in event productions (the likes of ESL, Dreamhack & PGL), I had to restructure the storage quite a bit to be a bit more denormalized, as well as upgrade the disk to run on SSDs. All of this work was done just before ESL Hamburg 2019 and it makes the more complex queries complete significantly faster (30–40x faster on frame queries depending on filter complexity) and has resulted in the server load dropping from ~10 across the board to just 1.4/1.3/0.9 despite the relatively busy event period. Some caching has also been drastically improved — but the impact of this is more query-dependent.

Barring the vacuum task at the start of the period, 99th percentile query run-times seem to hover around 50ms

This means that all the Frame data is now API accessible, just by inserting “/api” between the host and the path on a regular query. What follows are two examples of data visualization that’s now a lot easier to do without manually downloading CSV data.

Miracle’s Anti-Mage

When looking at the top 100 GPMs at the 35 minute mark, it’s no surprise that Alchemist dominates so many spots (93 as of the writing of this post). Miracle’s brilliant Anti-Mage game in the AMD Sapphire Dota Pit WB Semi-Finals against Newbee is the highest non-Alchemist performance and it’s interesting to see how despite a (relatively) slow start compared to Alchemist runs he managed to claw his way back in. Twenty-one kills and four assists doesn’t hurt his case either.

Loading the data from the API is a simple API request, with some matplotlib magic (and some 538 styling!)

Natural Born Rosh Killer

What if you wanted to look at which are the best Roshan-killing heroes — as in average Rosh kills per game? Meepo has the the most lagged start but very quickly spikes by the 30 minute mark.

We also visualize a fraction of the standard deviation (provided by the frames endpoint by default, along with frequency count) to show some of the variance at each point. Huskar’s high variance around the 40 minute mark is massive compared to Meepo and Ursa.

In addition to this:

  • there are a few general fixes to some frames (some were skewed by a few (~2–3) seconds leading to annoying, slightly off stats.
  • I’ve retro-fixed a few smurf accounts (s4 and Arteezy) to be streamlined into their main stats.
  • A few event splits were updated (online/LAN/post-event).
  • Valve’s GetLeagueListing WebAPI endpoint is dead, so I’m manually updating leagues until there’s a long-term solution. If you notice a match or league missing that you think should be considered ‘pro’, drop by the Discord.
  • With all the optimizations and upgrades to performance, the need to migrate to a new server has probably been delayed until early next year. For now 64GB RAM and 8 physical cores will suffice.
  • Been working a bunch on statistics for Artifaction.