OpenAI Five from a player’s perspective

Jun 26, 2018 · 9 min read

Yesterday OpenAI made a post detailing their progress in developing bots capable of playing Dota 5v5. They first made a splash in last year’s TI with their bot that beat many of the top players in the world in “1v1 Shadow Fiend mid” game mode. Now they have shown the capability for long term planning and teamwork between bots. These are essential skills in playing the game 5v5.

I’ve seen many excited reactions from machine learning researchers about OpenAI’s achievement. In this post my goal is to discuss the achievements from a player’s perspective. What has been achieved and what is still left in Dota? I haven’t seen the bots play live, so I’m basing my comments on what has been shown publicly in the various blogs and videos.

From 1v1 to 5v5

When the 1v1 bot was revealed last year, I was quite skeptical about the possibility to quickly move to 5v5. There are many obvious differences about the game modes: the number of players and the teamwork aspect, the win condition (killing the tower/getting 2 kills in 1v1 vs. killing the enemy ancient in 5v5) and the amount of different actions you can do on the entire map which can have an impact on the game. But there are also differences that may not be as obvious to people less familiar with the game.

1v1 mid is very close to a perfect information game. Most of the time you either see the enemy hero or have just seen it. There is little of relevance that happens in fog of war. There are times when you are in the river and the enemy is up the hill and out of vision, but they will quickly reveal themselves by using their high ground advantage to attack you or by hitting creeps.

In contrast, in 5v5 both teams constantly have to make decisions with incomplete information. You have to understand the implications of the signals teams are giving each other. I will list a few examples:

  1. You want to go for a kill on an enemy hero. To make it less obvious, you often have to show one or more of your heroes on other areas of the map. If all of your heroes are in fog and opting not to farm creep waves, the enemy knows you are likely up to something.

The amount of long term planning is also quite limited in 1v1. Most of the time each gained last hit and deny brings you closer to winning, and these intermediate rewards are used to help learning (it is worth noting that OpenAI have also tried training their 1v1 bot by simply giving it rewards for winning or losing, and got decent results). Last year the 1v1 bot had learned some strategies like baiting that required some planning, but the scale is quite different compared to 5v5 where certain actions can heavily influence what your team and the enemy can do in the next minutes. A lot of the strategic aspects of the 5v5 game also vary according to the hero lineups of each team.

The restrictions

The current OpenAI Five has several restrictions such as no wards, no Roshan and some items being banned. But to me the most interesting restriction is clearly the restricted set of heroes.

While the entire game has 115 heroes, the bots play the 5 same ones every game: Necrophos, Sniper, Viper, Crystal Maiden and Lich. This choice is relevant not only because of the challenge of learning how to play all the different heroes technically, but how the choice of heroes influences what types of play end up being viable. These heroes are relatively restricted in terms of the type of game they can play. They like to stick together in groups and have a hard time being mobile and pressuring multiple parts of the map on their own.

The other interesting thing is that the match is a mirror match. This is a situation that doesn’t come up in Dota normally because any hero can only be picked by one player. The implications of this restriction in terms of gameplay seem quite significant. One key aspect of Dota is understanding the strengths of your lineup versus the enemy, and being able to utilize them. Sometimes your heroes have worse teamfight but more mobility than the enemy. You want to split them apart and only take engagements where you have the man advantage or where you get the superior initiation. Sometimes your lineup has a clear timing window to hit. For example a team that picks Templar Assassin is often looking to take a commanding lead or even end the game with the first or second Roshan kill of the game, but if the game goes later the hero can become a liability due to its glass cannon nature.

In a mirror match a lot of these strategic aspects don’t exist in the same way. Of course both teams still have some sort of sudden power peaks. For example, if one team gets their ultimates online first they are favored in any engagement until the enemy team reaches that point. When you buy a new item, you are stronger for a while until the other team buys items as well. But there is no concept of one team being stronger in the early game or the other being better in a split up pickoff heavy game due to the nature of their lineup.

Due to the selected set of heroes, I suspect it’s very hard to come back from behind. The heroes have limited ability to outmaneuver the enemy and pull them apart. They also have a very hard time winning a 5v5 fight from a disadvantageous position because the heroes are badly lacking in fight initiation capability. Playing a 5 man heavy strategy after a successful laning stage seems very hard to beat.

Of course you have to start from somewhere, and what the bots have learned this far seems already very impressive. The question for me is what part of the game turns out to be the most difficult. Was it being able to learn teamwork and longer term planning to begin with? How easily do the same methods learn different kinds of teamwork required to play different kinds of lineups and to respond to different game states? It may be that even the best humans can be beaten by just executing a certain limited style of play extremely well. If that is the case, it would still be interesting to see bot matches where one team has to execute a very different game plan from the other.

The reward function

One interesting aspect of the bots is the designed reward function. In 1v1 the natural intermediate rewards like last hits, denies and damage given/taken are very strongly connected to winning. You have to consider some of the trade-offs, for example is it worth giving up a last hit to dish out a certain amount of damage. But in general you win in 1v1 by doing actions that give you almost immediate rewards.

5v5 is much more complex. In the end killing the enemy ancient is what matters. You need to kill lane and neutral creeps, collect runes, kill enemies and hit buildings to get there, but getting even a significant reward from one of these things doesn’t necessarily mean you are doing the right thing the same way it does in 1v1. You consistently have to consider the long term effects of your actions rather than simply the short term rewards. For example, it’s very common that some heroes have to sacrifice their own farm for a while to protect teammates, go for a kill on an enemy 30 seconds later, or to go for a tower push with the team. You also have to learn to understand how the information you give to the enemy affects the way the enemy have to play. This will become even more relevant when more pickoff capable heroes are introduced to the pool, and when warding is allowed.

In principle these sorts of things are taken into account in the current reward function. The agents don’t only try to maximize their immediate reward. And even if they aren’t farming creeps and increasing their own rewards that way, they may be able to grow the difference between the rewards of the teams by forcing the opponent to a defensive stance. However, as different heroes and different styles of play are introduced, it seems to me that figuring out all the different ways various actions affect future rewards longer term becomes much more difficult.

It is hard to say based on the available footage how far the bots have already come and where their weaknesses are. Perhaps it will not be completely clear even after seeing the bots live, if the hero restrictions are still in place and the types of games we see are limited.

One of the noteworthy things OpenAI raised in their post was the difference in playstyle of their bots compared to humans. The bots gave all heroes more equal amounts of resources early on, while humans often prioritize the experience and gold of a few heroes. I am curious how this behavior looks like, and whether it would emerge with different kinds of lineups. For example, if your team has a Phantom Lancer and the enemy team doesn’t have good counters, it makes sense to concentrate your resources on the PL. Another interesting question is whether the resource distribution arises because its truly optimal for winning, or does it arise because of how the reward functions are designed for each hero.

It is worth considering how sensible different strategies are for human players of varying skill levels. When you prioritize your resources heavily on one person, you better play effectively around that player and that player better not make terrible mistakes. It feels like a more equal resource distribution is likely beneficial for lower tier players so that the game doesn’t rest on one person’s shoulders.

Competing against human teams

Anyone who has played Dota matchmaking games has probably experienced how bad humans can be as team players. There are several factors at play, even if we don’t consider the players who just refuse to collaborate for whatever reason. Firstly, while the game is played 5v5, most of the time people play with complete strangers. Teammates change from game to game and they have limited communication with each other. The game experience isn’t really tailored for humans to learn the team aspect of the game in the same way as a kid playing football with the local coached junior team for 10 years would.

Secondly, the requirements of handling your own hero constantly are taxing for most players. Even if you intend to play as a team, observing the entire game state and keeping longer term objectives in mind in addition to handling your own hero is difficult. I would argue that it is only at very high levels of play where players can handle normal tasks effortlessly and have gained enough game understanding that they can correctly identify worthwhile longer term objectives for the team in real time and act accordingly.

Thirdly, human players have their own objectives when they are playing. While bots are playing game after game to optimize their reward functions, humans can jump on a server with various different mindsets. It is easy to assume that everyone is just trying to win, but people also want to play the game the way they like to play it, with the hero they like. We may also do things just for the fun of it instead of taking the safest route to guarantee a win.

All of these things result in amateur players playing a game that is in many ways completely different to the game professional players play. This makes it hard for me to interpret what the different wins the bots have taken mean in terms of their capability.

Looking ahead

It will be interesting to see how easily OpenAI can switch the heroes the bots are playing and the heroes they are up against. Even with the same lineup you may have to play very differently if the enemy heroes are different. Different situations require different kinds of team play and more general strategical understanding. I’m also looking forward to seeing how well the strategies the bots have learned work when playing against high level teams.

Hopefully we can get a better understanding of where the bots stand in the showmatch on July 28th!

Earlier posts from OpenAI related to Dota 2

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store