OpenAI’s Long Pursuit of Dota 2 Mastery

Published in

SyncedReview

11 min readSep 5, 2018

A hearty round of applause arose from the crowd packing the Vancouver Rogers Centre on August 22 when a team of unassuming scientists wearing “OpenAI” T-shirts climbed up on stage. They had come to Canada to pit their artificial intelligent bots against professional human players in a highly anticipated, world-first 5v5 showdown in one of the world’s most complex video games ever, Dota 2.

The journey to the historic match began in the winter of 2016, when an OpenAI research team led by CTO Greg Brockman was searching for a challenging game environment with competitive benchmarks where it could test its AI research and techniques against the skills of human professionals. Games are a hotbed for AI research: they are computationally complex; have rich human-computer interactions; and generate tons of data.

Founded in 2015 in San Francisco as a non-profit AI research company backed by Elon Musk, OpenAI’s ultimate goal is to build an Artificial General Intelligence (AGI) capable of performing a multitude of tasks within one general system. OpenAI regards the creation of an AI that can perform as quickly and effectively as human pros in a complex computer game environment as a major step toward achieving AGI.

Beating humans is also a convincing way for AI researchers to make their mark. The dramatic victory of DeepMind’s AlphaGo over Korean Go Grandmaster Lee Sedol in March 2016 pushed the envelope in AI gaming and secured DeepMind a place in AI history.

OpenAI researchers surveyed various games on the Twitch and Steam platforms before deciding to tackle Dota 2, which can run on Linux and has an API. Developed by the Valve Corporation in 2013, Dota 2 is a highly complex and wildly popular multiplayer online battle arena (MOBA) video game played between two teams of five players. The team that takes down their opponent’s center base “Ancient” wins the match. The game environment has 115 characters and all-important “Heroes,” 22 defensive towers, dozens of non-player characters, hundreds of skills and items, and a long tail of game features such as runes, trees, wards, and so on.

Early struggles

OpenAI’s first Dota 2 effort was a scripted computer with hard-coded rules. It could improve its tactics only by acquiring additional expert input: How to buy items? What was the last hit? How do we deny? How do we best take towers?

In early 2017 the team created what was at the time the best version of a scripted bot, which managed to beat amateur Dota 2 players. Researchers however could not handle the complexity involved in scripting the bot to the pro gamer level. So they ditched their rule-based code entirely and replaced it with reinforcement learning (RL).

Reinforcement learning is an incentive-based technique that enables computers to learn new skills. Starting with a set of actions the computer can take (defined as policy), the system works to maximize value, which is defined as the sum of rewards it receives over time.

Instead of being trained in the full 5v5 Dota 2 environment, the RL-based bot was placed in a Dota 2 challenge called “Kiting” with simplified rules and objectives. On a circular island, the bot was tasked with approaching and killing a human-controlled “Hero” without being killed itself. However, even achieving this one simple objective proved to be much more difficult than anticipated.

*A RL-based Drow Ranger learns to kite a hardcoded Earthshaker.*

“Humans were good at avoiding the machine, mainly because humans tended to act in ways that are different from what would happen in training. Also, the trajectory that humans would take was different from what the agent was trained to predict,” OpenAI Researcher Jonathan Raiman told Synced.

To address this the team started adding randomization to the training. Instead of following a deterministic policy where the computer selects actions based on the current state, heroes were programmed to sometimes move slower or faster, or to encounter glitches that prevented them from being able to walk when they wanted.

The scheme worked. Randomization ramped up the RL’s policy robustness and enabled the bot to regularly beat humans in Kiting. When the team applied the same technique to Dota 2 in 1v1 mode — where the player who achieves two kills or destroys an enemy tower wins — the RL bot quickly eclipsed the scripted bot’s performance. As of July 2017, the OpenAI bot was beating professional gamers in Dota 2’s 1v1 format.

After being defeated by the OpenAI bot, former professional Dota 2 player William “Blitz” Lee predicted “this is going to change how people play 1v1.”

Recalls Raiman, “That’s when the team started saying: ‘We might be able to do the full game one day if we’re able to put enough computers together and run the same algorithm.’” But before rising to the full 5v5 challenge, OpenAI was curious to find out just how good their 1v1 bot was.

At the Dota 2 International 2017 last August in Seattle, OpenAI’s 1v1 AI bot took on one of the best solo gamers, Ukrainian Dota 2 pro Dendi, on the main stage. The bot won the first game in less than 10 minutes, and Dendi surrendered scarcely minutes after the second game began. Throughout the contest, Dendi repeated “This guy is scary.”

*Dendi vs OpenAI bot at The International 2017.*

OpenAI’s victory in the 1v1 match proved that reinforcement learning can work in a complex game environment that requires a long horizon strategy.

After beating Dendi, Brockman proclaimed “the next step of the project is 5v5. So wait for next year’s The International.”

New LSTM bot trains on 180 years of gameplay each day.

OpenAI built its current bots’ brains with Long Short Term Memory (LSTM), a unit in a Recurrent Neural Network (RNN) proficient at remembering information for long periods of time and well-suited to classifying, processing and making predictions based on time series data.

“The reason why these things are needed is very similar to how you would teach a child how to do something straightforward. You need to have to teach them what’s good and what’s bad. Also, then you have some memory of what they just did,” OpenAI Researcher Susan Zhang told Synced.

Each bot’s underlying neural network includes a single-layer, 1024-unit LSTM that observes the game’s state and comes up with appropriate actions. The interactive demonstrations below shows how the AI bot makes decisions on actions.

In the above game capture, the bot-controlled Hero Viper attacks the mid-lane, releasing Nethertoxin (a skill). To perform this action the bot needs four metrics: actions (including moving, attacking, releasing skills, using items), target unit or positions, specific position of the target mapped in (X, Y), and timing. OpenAI eventually discretized the entire game into 170,000 possible actions per Hero. (By comparison, the average number of actions in chess is 35; in Go, 250.)

OpenAI’s new generation bots learned from self-play, starting with random parameters and not relying on human knowledge. Researchers used Proximal Policy Optimization, an advanced RL algorithm that requires less data than the general policy gradient method to achieve better results.

To avoid “strategy collapse” — a RL failure that can result in an endless training loop — the bots trained by playing 80 percent of its games against itself and the other 20 percent against its previous versions. The bots self-played on 128,000 CPU cores and 256 GPUs, accumulating the equivalent of up to 180 years of game time in each day of training.

Exponential decay factor is a critical parameter that determines whether the bot is looking at long-term rewards or short-term rewards. OpenAI also introduced a hyper-parameter called “Team Spirit” which ranges from 0 to 1, and assigns weights to determine how much each of OpenAI Five’s Heroes should care about its own reward function versus the average of the team’s overall reward functions. Over training, the team annealed the bots’ Team Spirit value from 0 to 1.

“The AI only needs two days to crush us”

A tradition developed at the OpenAI office: Every Monday night, the team would get together and play Dota 2. Eventually, they started to play against their own bots.

Raiman still remembers the day this May when the bots first defeated a team of his colleagues in a relatively restricted 5v5 match that lasted 45 minutes: “I was so excited. I would say that’s when I thought we now had a fifty-fifty shot [against pros].”

Raiman says the team discovered that just two days of training would now make the bots stronger than anyone in the office. “There’s a window of about twenty-four hours to forty-eight hours between the moment you start [training] from scratch, where it’s completely random; to when you can no longer play with it effectively and it can beat you consistently.”

In a June, OpenAI invited a team of amateur players ranked between 4000–6000 to their office to play the bots. The bots won handily. Elated, the team announced their squad of Dota 2 bots now had a name: The OpenAI Five.

Bill Gates tweeted after the match, “AI bots just beat humans at the video game Dota 2. That’s a big deal because their victory required teamwork and collaboration — a huge milestone in advancing artificial intelligence.”

OpenAI now set its sights on the Dota 2 International in Vancouver, where it hoped to take down a pro human team.

Is OpenAI cheating?

While the mood was upbeat at OpenAI, many Dota otaku remained unconvinced by the OpenAI Five victories in June. They argued that the game rules were entirely different from a proper 5v5 game: only five Heroes available, no warding, no bottles, no Roshan, and visibility. It was cheating, at least from their perspective.

OpenAI lifted some of their self-imposed rule restrictions: they put Roshan back in the game; added warding; and increased the number of heroes from 5 to 18.

Warding, which enables vision in an unknown area, is a must-know skill in Dota 2. Human beginners can access an advanced ward guide online to learn the skills, but the bots cannot, and they tended to waste their wards in areas that were already visible.

Roshan is the most powerful “neutral creep” threat in Dota 2. Fighting him is a tricky team decision that requires considering both timing and approach, as it can decide the future of the match. Killing Roshan provides a significant reward, but Roshan is powerful and fighting him in the early game can leave your own players in poor health or even kill them.

*The OpenAI Five attempt to kill Roshan in a game against paiN Gaming at The International 2018.*

The bots were reluctant to take down Roshan at the beginning because they determined their risk of being killed was too high. OpenAI addressed this by randomizing Roshan’s health, to encourage the bots to kill the creep when it was weak. While the trick worked, the bots now seemed to be overtrained, and wasted too much time monitoring Roshan’s health and attempting to kill it whenever they saw a chance.

“We are running out of time”

On August 5, just three weeks before the Vancouver showdown, OpenAI organized a benchmark test against a team of casters and ex-pros with MMR rankings in the 99.95th percentile of Dota 2 players worldwide.

The match was hosted in a San Francisco bar in front of a live audience of 300 people. Synced interviewed dozens of attendees, many of whom were betting on the bots: “I emotionally support humans, but I don’t think they have a chance to win the game.”

*OpenAI benchmark test in San Francisco.*

Before the match, human gamer David Tan aka MoonMeander tweeted “Never lost to a bot before and this ain’t gonna be the first CruW.” He was so wrong. The bots beat Tan’s team in the first two games, with the humans lasting only 20 to 25 minutes before calling GG (good game) in surrender.

It should have been a perfect victory for OpenAI, but to add some excitement to the third game OpenAI asked the audience to draft the OpenAI Five Heroes. As expected, they selected an adversarial lineup, which exploited a weakness in the bot team. Before the match began, OpenAI Five predicted it had just a 2.9 percent chance of winning with this setup. The bots ultimately lost the game after 35 minutes and 47 seconds.

“I think how badly game three went was also a moment for us to sort of step back and figure out what we could do to improve in the cases where we are doing poorly,” says Zhang.

Meanwhile the Vancouver showdown was quickly approaching — where the OpenAI Five would face off against pros ranked at 7000–8000, much higher than the benchmark series opponents.

The team tried to establish another benchmark milestone (changing five invulnerable couriers to one killable courier), but due to limited staff and time, this did not work out so well. The OpenAI bots began training with a single courier in mid-August, and the transition degraded performance.

“You need to give the experiments time to run, give the bots time to train. We just don’t have that much time right now,” said Zhang.

Vancouver: A loss for the OpenAI Five, a win for AI

As they took the stage at the Rogers Centre, many in the audience believed OpenAI had a strong possibility of victory. Almost all previous OpenAI-vs-human Dota 2 games had been one-sided: their 1v1 bot had beat the world’s top gamer at the 2017 International and the OpenAI Five had won two out of three against ex-pros at the San Francisco benchmark test.

But it was not to be. In the first game Brazilian pro team “paiN Gaming” dispatched the bots in 52 minutes. In a win or go home game on the International’s second day, five Chinese Dota 2 legends — three of whom had played on a championship team together — defeated the bots in 45 minutes.

Brockman was gracious in defeat, tweeting, “Lots of extremely exciting plays by both teams. Has been a great showcase of what both humans and AIs can do.”

The OpenAI Five performance gave the team plenty to build on: the bots lasted longer than in a usual game in both contests; had more kills than the human teams; and won most team-fights — this attributed to their error-free micro-level control. But the bots also made plenty of bad moves: warding in the wrong positions, unreasonable item choices, and fewer gankings (leaving your lane to kill an enemy Hero in another lane).

OpenAI’s underlying research progress is impressive: Using a relatively simple technique, researchers enabled complex coordination and long horizon game play in an imperfect game environment, training a computer from scratch to the level of a Dota 2 master. These techniques can likely be used in other AI applications such as robotics and general AI systems.

The International 2018 may not have gone the way OpenAI hoped, but neither was it the team’s epitaph. The OpenAI Five are back in training and will compete in a full Dota 2 match with all Heroes either later this year or in 2019.

They may have lost the latest battle, but the OpenAI Five’s war against humans is far from over.