Did DeepMind’s AlphaStar AI Just Take Us A Step Towards Artificial General Intelligence?

What does playing the video game StarCraft II have to do with Artificial General Intelligence?

9 min readNov 15, 2019

Image CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=16860633, By AlejandroLinaresGarcia — Own work

When Google’s DeepMind published a Nature paper on 30 October 2019 detailing how its AI, AlphaStar, achieved Grandmaster Status with all three “game races” in StarCraft II, it was done with relatively little fanfare for such a significant moment in AI history.

In contrast, two years ago in May 2017, DeepMind had a very public competition between its AlphaGo AI and the world’s top Go player, Ke Jie, winning all three matches. That event, following on from an earlier competition against the Korean 9th dan Go player Lee Sedol in 2016, was even labelled by venture capitalist Kai-Fu Lee in his book “AI Superpowers” as “China’s Sputnik Moment” ¹.

What is the significance of AlphaStar’s victory in StarCraft in the field of Artificial Intelligence? And why is going from the board game Go to the video game StarCraft more significant than going from Chess to Go?

From Chess to Go…

When stacked up against chess, the ancient Chinese game of Go is a far bigger challenge for AI research. The complexity of Go (in computing terms), is the sheer number of possible positions on the Go board, which turns out to be 10 to the power of 170. Or, as often quoted: more than all the atoms in the universe. IBM’s Deep Blue, which beat the legendary Chess Grandmaster Gary Kasparov in 1997, was a machine that used a large number of parallel custom-made computer chips and optimised software to search 200–300 million positions a second to compute optimal play ². This method would not be feasible for Go. Indeed, since it is impossible to calculate a meaningful number potential moves and their downstream consequences at each turn (called the “search space”), many believe that the top human Go players have an almost mythical “feel” of the board as a game progresses, guiding them to select winning strategies.

DeepMind tackled the problem of Go using neural networks trained on expert human play, followed by AI self-play (reinforcement learning), augmented with a fast but less accurate stochastic search for the optimal solution ³. The technology behind these neural networks were not invented by DeepMind specifically to solve the problem of Go. These neural network algorithms were in fact adapted from research in computer vision, which at that time was beginning to achieve superhuman results.⁴

Despite their complexity, games like chess and Go are known as “games of perfect information”. That is, each player can see the whole board, knows where each piece is and understands the capabilities and constraints of all the pieces. The methods used to develop AI to excel in these games will likely translate poorly to developing AIs to work in the real world where information is almost always patchy. Think of a company making strategic decisions to outflank its competitors, or commanders on a battlefield trying to decide the next course of action, or indeed your own decisions on what stocks to invest in.

In addition to imperfect information, in the real world we also have to contend with a massive number of potential moves we can make at each step. What a battlefield commander can do is not restricted to a small number of “legal moves” that characterises board games such as Go and chess (this number of moves is called an “action space”). Hence, even though the number of possible board positions in Go is vast, the maximum action space is actually only 361 (since an empty Go board has 19 x 19 locations where a token can be put).

While the achievement of AlphaGo is impressive, it represents but the beginning of developing decision-making AI to tackle problems in the real world. The profusion of AI software being developed and applied today can perform tasks within narrow boundaries very well, sometimes better than humans. For instance, models that have been developed to detect tumours in patient x-rays now have error rates better than professional human practitioners.⁵ The AI models that were trained for these narrow tasks do not perform well in other tasks which they were not specifically trained for. Some commentators call these AI models “narrow AI” to distinguish them from the idea of “Artificial General Intelligence” (AGI).

AGI is probably the image you have in your mind when you think of the two words “Artificial Intelligence”. It is the stuff of science fiction where software makes decisions like, or better than, humans in the real world — the real world that is plagued by incomplete information and massive action spaces.

…From Go to StarCraft II

With AlphaStar, DeepMind has taken a tentative step in the direction of AGI. But to understand how, we first have to understand why StarCraft II is the logical next step in their quest.

StarCraft is a real-time strategy video game that was first launched in 1998 by Blizzard Entertainment. It has a cult following and over the decades, has evolved into one of the toughest e-sports where players compete for millions of dollars in prize money. This is important, because like chess and Go, there are human players who have been rated through competition, against which an AI model can be benchmarked. In addition, the age of the franchise means that the league tables are mature and are likely to contain only players who are truly expert at the game where the bar has been rising over the years.

The variation of StarCraft II that was used in DeepMind’s research is known as “1v1”. In it, like chess or Go, two players play against each other on a “board”, but that is as far as the similarity goes with the board games. This “board” is actually a chunk of terrain that is longer than it is wide (like a yoga mat), with mountains, forests and other geographical obstacles scattered throughout. Each player starts at opposite ends of the terrain with a single building and small number of worker units. From this a player collects resources — minerals and gas — to build more buildings, spawn more units, create other types of units (such as different types of military units), perform research for advanced technologies and send scouts out to map the terrain and gather intelligence on the enemy. You win by destroying all of the enemy’s buildings before the enemy does it to you.

The StarCraft player not only has to make decisions on economics such as how much of the limited resources to distribute to mining, research, building, scouting, etc., in order to build up defensive and offensive capabilities, but also strategy — where to build obstacles and place assets — and tactics — controlling units of various types to fight or harass the enemy. Typical game play is approximately 10 minutes long and consists of thousands of moves. The top players can make over 500 actions per minute: their fingers blur over the keyboard.

Compared to Go which has a measly maximum action space of 361, StarCraft II has an approximate 10 to the power of 26 possible actions at each step. Decisions have to be made in an environment of time pressure with incomplete information since an enemy’s disposition and the terrain has to be obtained through intelligence gathering. Hence players have to make assumptions (especially early in a game) about the opponent’s strategy in order to formulate a game plan to be the first to develop a force that is able to destroy the enemy while building an adequate defensive capability so that the same does not happen to him/her/it. Compared to Go, StarCraft II presents some of the complexities of the real world, albeit still limited by the design of the game and the rules of gameplay. Hence StarCraft is a logical stepping stone in AI research towards AGI.

AlphaStar’s first public outing

In December 2018 DeepMind first pitted AlphaStar, its AI trained to play StarCraft II, against two professional players. The competition was carried out in private in DeepMind’s London offices and was restricted to only one of the 3 player-types (known as “races”) in the game, the Protoss race. In StarCraft, the different races have different strengths and weaknesses; players choose their race and square up against an opponent which can be any of the other races. This adds another dimension of complexity to gameplay. That single-race competition was to be DeepMind’s proof-of-concept, and it beat both top ranking players, one of whom was a Protoss Grandmaster, by 5 games to 0 each.

The next month in January 2019, DeepMind and Blizzard live-streamed an event that was part corporate presentation, part competition.⁶ It was staged like an e-sport event where the December games were revealed to the world for the first time. Even the pro-commentators guiding the audience through replays of the games saw the matches for the first time. In those replays we saw how AlphaStar beat the human players, with feedback commentary from the players themselves. In some games, the human players felt that they could have won but for a tiny error in strategy on their part.

The replays were followed by a live exhibition match between top Protoss player Grzegorz “Mana” Komincz and AlphaStar. Mana had already lost 0–5 to AlphaStar at DeepMind’s offices in December, but at the climax of the live stream, he actually beat the AI. Mana’s own reinforcement learning and decades of playing the game had produced a stronger player!

DeepMind, learning from the experience and collaborating with one of the two players, has since spent the resources to further train AlphaStar with and against all three races; and the 30 October 2019 Nature paper reported that the AI has been rated Grandmaster for all races, at the rarefied level of the top 0.2% of all ranked human players.

Adapting research from other domains

The basic architecture of AlphaStar can be traced back to AlphaGo. First a deep neural network is trained using nearly 1 million examples of human play by the top 22% of players (provided by Blizzard). In order for the neural network to “remember” and link successful strategies implemented earlier in a game with later outcomes, DeepMind used a machine learning technique that was developed to process language, used for example in text translations. This first step is known as supervised learning, and was followed by reinforcement learning where the AI agents (models that play in a game are called “agents”) played against themselves, selecting for the best AI players.

Since human players in StarCraft are limited to how quickly they can process the game and how quickly they can issue action orders on the keyboard and mouse, AlphaStar was actually handicapped to provide a more level playing field. A hint of this superhuman ability was experienced in the December 2018 match when AlphaStar, late in a particular game, had an almost god-like ability to observe and control units in skirmishes in several locations simultaneously. The poor human had to shift and focus attention on single theatres at a time (although he did it very, very rapidly). After that the algorithm was tweaked further, introducing a “limited camera”, to level the playing field again.

AlphaStar’s success demonstrates that with current “off the shelf” algorithms, an AI decision-making model can be designed to tackle situations of incomplete information and large action spaces. Indeed, the fact that the AI produced by DeepMind not only had to be handicapped but was also shown to achieve victory with fewer game actions than its human competitors in each game, should give us pause. Imagine a company CEO or a battlefield commander trying to outflank an unrestricted AI enemy.

Still a long way from AGI

While DeepMind has made a first tentative step towards Artificial General Intelligence, there is still a yawning chasm to cross. StarCraft, while complicated, is still a game with boundaries. This does not mean that what was achieved cannot be applied to narrow situations in the real world with similar characteristics to StarCraft — a large but still limited action space, and an environment of incomplete information.

An important concept to emerge from both AlphaGo and AlphaStar is that the initial training of the neural networks were done using examples of top human gameplay. This gave the machines a leg up, in effect, learning the most effective strategies devised by human players over many years, and using that as a starting point to improve.

With no such data, or limited datasets, it would take much, much longer and cost significantly more to get to that starting point. For example in a simple but completely unsupervised reinforcement learning experiment simulating the game of hide-and-seek between AI agents by OpenAI, millions of games had to be played before any meaningful emergent properties were observed, and complex behaviour took tens of millions of games.⁷

Nonetheless, DeepMind has opened the door and taken us one more step towards AGI. What will the next Alpha AI tackle?

References
¹ Lee, Kai-Fu. AI Superpowers. 2018. Houghton Miffin Harcourt.

² Campbell, M., et. al. Deep Blue. 2002. Artificial Intelligence 134: 57–83 (doi:10.1016/S0004–3702(01)00129–1).
Also see https://www.theguardian.com/theguardian/2011/may/12/deep-blue-beats- kasparov-1997.

³ Silver, D., et. al. Mastering the game of Go with deep neural networks and tree search. 2016. Nature 529: 484–489 (doi:10.1038/nature16961).

⁴ See for example: https://www.eetimes.com/document.asp?doc_id=1325712.

⁵ See for instance: Esteva, A., et. al., Dematologist-level classification of skin cancer with deep neural networks. 2017. Nature 542: 115–118 (doi:10.1038/nature21056).

⁶ https://www.youtube.com/watch?v=cUTMhmVh1qs.

⁷ See https://openai.com/blog/emergent-tool-use/.