Balancing Neutral Items
Neutral items are the latest really significant step in Dota 2, joining the game in 7.23 (along with outposts and hero-couriers), refined in 7.24 (shrines removed, outposts relocated, dedicated neutral item slot), and then survived their first big balance patch in 7.25.
When talking about the power-level of neutral items there are certain overarching concerns:
- “base power”: some items are on a fundamental level “good” or “bad” from a power level perspective
- “niche”: some are extremely good in very specific situations, and bad in most other situations
- “replaceability”: an item which is functionally similar to another item but slightly better totally trumps the previous item
These concerns compound the classic problems of evaluating anything in Dota 2: it’s a highly complex game, there is inherent variance, there are not a massive number of pro matches on a patch, and teams have varied skill. Most of these problems we’ve learned to either work around (normalizing variables due to team skill, and updating our confidence based on a sample size), or we accept them (it’s a highly complex game — weird stuff can happen).
The approach for neutral items is still mostly uncharted. Unlike heroes, the items can be duplicated and appear on both sides. As a result, using pure winrate can be misleading —even if you’re normalizing for the skill of the teams. This is basically a further complication of a small sample: there could be a significant percent of games where both team had the same item (pushing the winrate towards 50% for it). Right now there 11 tier 1 items, and each team gets dropped 4 neutral items; leading to just a 10.6% chance that teams have no tier 1 item overlapping (similarly, but independently for tier 2).
The approach I’ve looked at most recently (7.25 pro matches) is trying to map a vector (one column per item) with the following encoding:
- -1 if Dire possess the item but Radiant doesn’t
- 0 if both teams possess the item, or neither team does
- 1 if Radiant possess the item but Dire doesn’t
By “possess” I mean that the item has been dropped for a team and they’ve picked it up.
This is mapped to a y-value representing the Elo (k=32) shift based on the result of the game (a negative value represents a Dire win, and a positive value represents a Radiant victory). I’ve only considered Tier 1 and Tier 2 items for now, but will expand to more in the future. Games where one team has a lot more items than the other (for example, if the game ended before the defending team could get their Tier 2 items) are filtered out. Using an ElasticNet (a regularized regression model that aims to improve on lasso and ridge techniques) we can estimate linear coefficients for each item, this measures their impact on the outcome.
This provides a rough ranking for the items — with significantly more data we might see some movement in the non-zero coefficients but it’s unlikely to shift drastically unless there are significant metagame and/or further balance patches. Many of the coefficients are zero because the data isn’t dense enough for them to create visible nuances in the target vector — suggesting that there is insufficient evidence to support these items having any impact in the result.
What is quite interesting is that Royal Jelly is regarded as the 2nd best tier 1|2 item here (an item nerfed in 7.25a), which makes me wonder how Valve is doing any neutral item balancing: it’s statistically difficult to measure the impact of these items so do they rely on personal opinions, or were they just lucky in nerfing Royal Jelly along with the sweeping nerfs of other regeneration items? One aspect of Royal Jelly which makes it unique is that it applies a permanent buff, so it no longer needs an item slot (meaning that even if it’s inferior to other tier 1 items, provided that these items are eventually replaced the Royal Jelly provides some eventual benefit).
A potential solution is to assume that players are rational and optimal and then use massive amounts of high-level public data. This approach could yield some insight when looking at all the neutral items with some encoding for {unavailable, available but unused, used} at regular time intervals (e.g. every minute), various machine learning approaches could learn some preferences from this. Each tidbit of information learned would be small: a player having an item for a period of time signifies that they prefer the item to all the available items in the neutral stash. A secondary model could then look at how items are allocated across a team, but this model would probably be skewed by human irrationality (ever had a Crystal Maiden who finds Royal Jelly and then Jelly’s herself and her lane partner, instead of the 1 and 2 positions?).
Further work could be done: regressors from CatBoost and Xgboost were recommended by Aleksandr for experimentation — although fundamentally a large significant sample would still be preferred.
In any case, my hope is that the balancing of neutral items is done effectively and quantitatively. Neutral items are a source of randomness within the game, however there is skill in preparing for and adjusting to the various outcomes which may occur. As a result, neutral items need to remain interesting, diverse and powerful — yet also balanced. Just like every component of Dota 2 strives to be.
Thanks to Aleksandr Semenov, Federico Vaggi and Anthony Hodgson for their help with this article.