Is Alphastar really impressive?

You might go online and find headlines like “Alphastar goes 10:1 against human pros”, “Alphastar mastering real-time strategy game Starcraft II” or “Human StarCraft II e-athletes crushed by neural net”.

I personally feel all these headlines misleading. I consider myself a Starcraft fan and I played at an average level for multiple years I’m also a data engineer with a certain exposure to data science and machine learning so I feel I can understand general concepts of both worlds.

Let’s start with Starcraft. Starcraft II is a real-time strategy game published and developed by Blizzard Entertainment. The title has both a single player campaign but more importantly an E-Sport scene where professional players fight each other in multiple tournaments during the year with the possibility to win considerable prizes. To give you an idea of how much a Starcraft II player can earn in his career have a look here.

In Starcraft II almost all the tournaments are 1vs1 matches played on different maps. The objective is to force the other player to surrender using your military forces.

This sounds pretty easy but Starcraft II is based on perfect mechanics. You have to constantly spend all the resources that you are collecting, managing multiple armies where every unit might have special abilities, defend multiple outposts and take the right decisions. This involves an extreme multitasking and one wrong decision might lead to your defeat.

To be able to cope with all these actions a Starcraft player looks like a pianist on a keyboard, the number of actions per minute (APM) that he has to perform is crazy even if only a few of them are effective (EPM).

Last but not least Starcraft II has 3 different playable races: Protoss, Terran, and Zerg that have very few in common. For this reason, a professional player normally plays only one race and he has to prepare not only for his opponent but also the general matchup with different races since his strategy will drastically change.

This is Starcraft II in a nutshell and hopefully what I’m going to write now will make more sense.

A few days ago Google Deepmind Alphastar challenged two Starcraft II pros Dario “TLO” Wünsch (Zerg) and Grzegorz “MaNa” Komincz (Protoss). As you probably know the aggregated result is 10:1 in favor of Alphastar.

Alphastar is a convolutional neural network that according to Deepmind has been trained for the equivalent of 200 years to master Starcraft II.
It is important to say that Starcraft II is a game of imperfect information. Unlike games like chess or Go where players see everything, crucial information is hidden from a StarCraft player and must be actively discovered by “scouting”.

How Alphastar has been trained?
According to Google Deepmind: “AlphaStar’s behavior is generated by a deep neural network that receives input data from the raw game interface (a list of units and their properties), and outputs a sequence of instructions that constitute an action within the game. More specifically, the neural network architecture applies a transformer torso to the units (similar to relational deep reinforcement learning), combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralized value baseline.”

“AlphaStar also uses a novel multi-agent learning algorithm. The neural network was initially trained by supervised learning from anonymized human games released by Blizzard. This allowed AlphaStar to learn, by imitation, the basic micro and macro-strategies used by players on the StarCraft ladder. This initial agent defeated the built-in “Elite” level AI — around the gold level for a human player — in 95% of games.”

“The AlphaStar league. Agents are initially trained from human game replays and then trained against other competitors in the league. At each iteration, new competitors are branched, original competitors are frozen, and the matchmaking probabilities and hyperparameters determining the learning objective for each agent may be adapted, increasing the difficulty while preserving diversity. The parameters of the agent are updated by reinforcement learning from the game outcomes against competitors. The final agent is sampled (without replacement) from the Nash distribution of the league.”

Agents distribution in Alphastar ladder and comparison with TLO and Mana (Deepmind infographic)

The first opponent of Alphastar was TLO which is a Zerg player but for this series of matches accepted to play Protoss. This factor has to be taken into consideration since playing another race obviously can be considered as a handicap. TLO has been beaten 5:0 but I personally think the real test is against Mana who is one of the strongest active Protoss players.

Alphastar scored against Mana 5:1 and if you are interested you can see the matches here.

In the first match, Alphastar executes a proxy. For who is not familiar with Starcraft II it means to build some buildings close to the opponent base in order to execute a fast attack that normally heavily commit the player. If you don’t win or cause a good amount of damage with your proxy your game is basically over.

Mana reads correctly Alphastar strategy and spots the proxy. This is normally a clear advantage since when you now your opponent strategy the only thing left is counter it with the right approach. But Alphastar does something unusual. Instead of using the warpgate technology even if researched he keeps producing units without using the instant warp. This might look a poor choice but in a short window of time, this allows the AI to have more Stalkers (A type of military) than Mana.

Mana setups a good defense and since Alphastar strategy has been scouted a normal human player would not normally commit his army without knowing what is waiting on the opponent base.
Alphastar instead “gambles” and goes up Mana’s base ramp.

The Agent shows incredible micromanagement of the single unit and beats Mana army pretty easily.

In game 2 and 3, Alphastar shows another time non-human micromanagement engaging perfectly Mana army. The AI loses very few units trading them in the most cost-efficient way. Also, Alphastar shows that oversaturating a base with probes despite what Starcraft II community might think is convenient in certain situations challenging strategies that have been around for years at every level.

Game 4 is more interesting. This is when I started to realize something is wrong.

Alphastar engages Mana army from multiple angles, too many angles at the same time. Alphastar doesn’t have the same perception of a player, the AI sees the map as a whole while Mana despite is APM is restricted on what it can fit in his screen. For this reason, Alphastar can manage engagement on 4 angles while he takes care of his economy in multiple bases.
Normally every bot has better micromanagement than a human player so I was not surprised about that. Here we are talking about Mana army that 99% of the times should win against Alphastar unit composition but loses simply because Mana cannot cope with the crazy multitasking that Alphastar is showing.

The last game is probably the most interesting of the series. Alphastar has been restricted to use only the in-game camera. This simulates the visual information that a human player can collect being restricted to use a PC screen.
Alphastar starts with a strong decision making, he sacrifices an oracle to kill two sentries. I don’t think because of how neural network work that he knows the value of the single units but he probably knows that doing that trade will increase his percentage of victory.

Mana, on the other hand, shows that he is learning from Alphastar! He decides to oversaturate his base with probes and probably starting to questioning the assumption that 16 probes per base are the optimal number.

Alphastar dominates the early phases of the game building a certain advantage on his opponent and it looks like because of his perfect micromanagement the last game will finish like the previous.

Alphastar at this point adds a third base on his economy increasing the number of things that he has to control while Mana scouts perfectly and decides to stay on two bases.

Mana offense does some damage but because of his inferior economy, he needs to make something happen soon in order to close the game.

He goes for a war-prism immortals drop on Alphastar main base while the AI was organizing an offense and there something goes wrong.

Every player would stop this type of harassment splitting his army into two groups or producing an air unit that can prevent the prism to drop further units. Alphastar decides to move back his entire army and he does it multiple times while Mana was simply inflicting some damages pickup the immortal with the prism and leave for few seconds.

Alphastar looks like in a kind of loop and starts to take a series of poor decisions that allows Mana to buy more time and build an Army that despite Alphastar superior micro-management cannot simply being beaten.

The final score is 5:1 but in my mind is 0:1 and I’m eager to see other matches where the AI is put in the same conditions of a human player.

Is Alphastar really impressive? Yes, it’s far better than every Starcraft II AI that I’ve seen and the AI is actually challenging the meta of the game.

But I don’t think at this stage Alphastar is far better than one of the top 5 Protoss players in the world. 
Do not misunderstand me, Alphastar will start to consistently beat every opponent even when put in the same conditions of a human player (limited vision and capped EPM). But for now, after a strong early game, Alphastar starts to show few limitations. Perfect micro-management cannot replace the need to take complex decisions and longers are the games more complex they are.