DeepMind’s AlphaStar is a Big Step Towards Human-Level AI (Part 2/2)

Alec Morgan
6 min readJul 8, 2019

--

This is part two of my blog on the profound research implications of DeepMind’s AlphaStar. If you missed part one, you can find it here. Where we left off we were about to dive deep into the inner workings of the relational module, the secret sauce inside of a pre-AlphaStar AI of sorts created during R&D. So let’s dive in.

During R&D, pre-AlphaStar was tested in Box-World as well as various StarCraft II mini-games.

Pre-AlphaStar was trained and tested in two environments: a simple but causally-challenging game called Box-World, and various StarCraft II mini-games. We’ll discuss the relational module’s visible implications in the context of both.

Illustration of the relational module, sourced from the paper.

The figure above shows the relational module in all its glory as it processes a scene from Box-World. What’s going on here? Don’t worry about all the stuff on the right, just focus on the diagram on the left: the process starts when AlphaStar reads in a picture of the game. AlphaStar uses a type of deep neural network called a convolutional neural network (CNN) to figure out what‘s actually in the scene — after all, all those random colors don’t mean anything until the brain decides they mean something. The CNN tells the relational module what’s in the scene and where everything is, and then the relational module takes over.

A little bit of info on the problem it has to solve: in Box-World, the player is represented by the dark grey pixel, and the other lone colored pixel is a key. All the two-colored pairs, meanwhile, are boxes: the pixel on the right shows what color of key is needed to open it, and the pixel on the left shows what color of key you get out of it. One of the boxes contains a “gem”, represented by a white pixel. When you get the gem, you win the game. Some boxes are “distractors”: boxes that don’t contain the next key type you need. If you open just one distractor, the game is lost — the order you open the boxes in matters. And just to make things slightly harder, the number of boxes you have to go through to reach the solution is randomized. How does the relational module help pre-AlphaStar to solve this?

Relational reasoning in Box-World.

By using a so-called relational attention mechanism, pre-AlphaStar identifies the relationships between different objects and each other as well as pre-AlphaStar itself. The heavier the line, the more attention is given to a particular relationship — for example, pre-AlphaStar realizes that the relationship between itself and the key is extremely important. Just above all the pictures of the game you can see a distillation of the same information, the underlying graph; this is pre-AlphaStar’s identification of the winning path, the exact order in which it must open all the boxes, as well as all the dead ends it must avoid. Clever bot.

Just solving the game isn’t enough though. As good researchers, the DeepMind team wanted to know if AlphaStar could solve the same problems better than existing approaches. To that end they created a ‘baseline’ agent, which is identical to AlphaStar except for not having the relational module. Here is how the two performed.

Control test: solving Box-World with and without the relational module.

As you can see, performance suffers significantly without the relational module. Moreover, the relational module took a much smaller hit to performance when solving longer, more complex rooms. This is exactly what we should expect from an AI with a stronger causal reasoning ability that’s trying to solve problems where causal reasoning matters. And speaking of causal reasoning challenges, onto StarCraft II.

StarCraft II mini-games.

StarCraft II is an excellent testbed for causal reasoning abilities. Being a complex strategy game means that it poses a lot of difficult and interesting cognitive challenges: planning ahead, thinking on your feet, managing hundreds of game pieces simultaneously, and guessing at what your opponent is doing unseen are just a few of the things players must do to succeed. In short, it’s a perfect. How does pre-AlphaStar fare against these challenges? Here are scores from the various mini-games that pre-AlphaStar was tested in.

Performance results versus four other AIs and two high-skill humans.

These graphs demonstrate somewhat less impressive results. Pre-AlphaStar might beat human grandmaster players in 4 out of 7 mini-games, but is usually within margin of error of the control agent’s performance. The only exception is the Defeat Zerglings and Banelings mini-game, seen below.

Pre-AlphaStar early in its training. Ouch.

Pre-AlphaStar commands the marines on the left and battles the zerglings and banelings on the right. Zerglings and banelings move faster and hit harder than marines — the only advantage of marines is their range. However, it is clear that range alone isn’t enough to win this fight either — at least not without some added cleverness.

Once again — clever bot.

Pre-AlphaStar eventually learns some very clever solutions: moving one or two units to satellite positions and using them as bait. This distracts the zerglings and banelings, giving extra time for the remaining marines to chew through the enemy with their ranged weapons. This is the sort of strategic complexity that pre-AlphaStar is capable of, to say nothing of AlphaStar proper.

With that being said, AlphaStar’s successes have not been without criticism. On the machine learning subreddit, multiple users have made posts detailing how AlphaStar appears to execute in-game actions with superhuman speed and precision. DeepMind has since made clarifying edits to the original blog post acknowledging this as a fact.

The implication of this is that we must be careful in how we assess AlphaStar’s intelligence. AlphaStar’s superhuman mechanical abilities can be used as a crutch and do a lot to mask its actual ability to out-think its opponents. Furthermore, there is a consensus opinion within the StarCraft II community that AlphaStar does exactly that; AlphaStar’s playstyle tends to involve aggressive pushes with inferior military units. This sets up a perfect opportunity to exploit a humanly impossible capability: by telling each and every unit exactly how to move in each fraction of a second for optimal execution of a sub-optimal plan.

These criticisms are valid, but they don’t invalidate AlphaStar’s progress and successes. Many systems before AlphaStar had boasted far greater mechanical advantages and yet still failed to even come close to the best human players, let alone defeat them flawlessly.

Conclusions

AlphaStar is one of the most intelligent systems humans have ever created, and it is likely that many future systems will draw from some of the techniques utilized by AlphaStar. AlphaStar may not be quite at the level of artificial general intelligence, but it’s much closer than the overwhelming majority of systems that have come before. Overall, AlphaStar represents one large step in a sprint towards the creation of AGI.

If you want to learn more about AGI, my earlier blog, Superintelligence, and Possible Minds are a few great places to start. Thanks for reading!

--

--