AlphaStar — what AI research can achieve with unlimited resources

Mattias Appelgren
6 min readJan 25, 2019

--

As you no doubt have heard, Google DeepMind have once again managed to take another leap in the progress of AI research. They have developed a set of agents which have beaten professional StarCraft 2 players at the game (https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/). StarCraft 2 is a Real Time Strategy computer game (others included in this genre are Age of Empires and Red Alert). The game provides a particular challenge for AI for two reasons:
1. it is played in real time
2. the game state is only partially observable
This separates it from games like Go, which DeepMind previously conquered with AlphaGo (https://deepmind.com/research/alphago/), where players take turns. To beat the best Human Go player AlphaGo had an average of 3 minutes per move and the game is completely defined by what can be observed on the board. A full game of StarCraft might be over in 15 minutes and you can’t see what your opponent is doing most of the time.

On the surface, this achievement looks like it could bring hope to AI researchers such as me: if DeepMind can solve a problem as complex as StarCraft, surely I can solve my comparatively simple problems! One of the main problems may be that I am not backed by Google’s bottomless pockets. You see I don’t have 7 million dollars lying around.

StarCraft 2 | Blizzard Entertainment

To understand why I might struggle how AlphaStar was created I’m going to walk you through how it works and how much money you would need to replicate the system.

Step one was to train an agent that can imitate human play. This was done through supervised training on a large set of games released by Blizzard (https://github.com/Blizzard/s2client-proto/tree/master/samples/replay-api). This agent already managed to beat the games own “Hard AI” 95% of the time (the “Hard AI” is at about Gold level). However, much more needed to be done before the agent could stand up to professional players.

Step two was to create the “AlphaStar League”. This league consisted of over 600 AlphaStar agents competing against each other. Each agent is given a slightly different training objective, such as “beat agents 2, 5, and 9” or “beat as many opponents as possible while prioritising building stalkers”. This way of training meant the different agents developed diverse strategies. This is important in StarCraft 2 because there is not necessarily one strategy which dominates all others. Instead, each strategy has strengths and weaknesses that can be exploited. It is similar to a game of “rock paper scissors”. Playing rock will not win every time, especially if your opponent knows you will play rock. Except in this version, once rock and scissors are chosen they need to perform a ritualised fight in space and scissors might just win, even though it is at a disadvantage.

My understanding is that each AlphaStar agent represents one preferred strategy. For instance, one may prefer attacking early, wheras another may play it slow and build a strong economy. Since no single strategy beats all others, DeepMind makes the final agent a mixture of the best agents they trained. Before each game they randomly select one of the agents to play. This means TLO and MaNa did not play one agent, but 5 different agents each. This is an important point to make, because it means that training all of the agents is necessary for the success of the AI, training just one would not have been enough.

This brings us to the million dollar question — how much did this cost? To train the agents they ran the AlphaStar League for 14 days. During this time each agent played thousands of StarCraft 2 games simultaneously, on average each agent played 200 years worth of real-time StarCraft 2. Each agent ran on 16 of Google’s Tensor Processing Units (TPUs), the custom hardware built for Deep Learning. Since there are over 600 separate agents this means around 9600 TPUs were used and over 60 000 years of StarCraft 2 was played. Yes, Sixty thousand years.

Lets say we are not Google and want to try and achieve the same feat. How difficult would it be? The easiest way to do the model training would be to access Google’s TPUs through their cloud computing service (https://cloud.google.com/tpu/docs/pricing). For the TPU v3, which they used for AlphaStar, there are two prices listed. Either $8 or $2.40. The higher price gives you full access to the TPUs you pay for while the lower price gives you “preemptile TPUs”. At this price Google reserves the right to interrupt your use of the TPU in case they require it for some other purpose. To run the AlphaStar League we would probably need the full price, but for the sake of argument, let us say we can achieve it at the cheaper price point (which is probably closer to the cost for DeepMind). So, 9600 TPUs run for 14 days (336 hours) would mean we require a mere $7,741,440… Note that we would require nearly $26 million if we went for dedicated, uninteruptable TPUs.

And after all that we should note that the agent has some fairly serious limitations. First, AlphaStar was trained to play just one of the many possible modes of the game: plays an excellent game of Protos vs Protos on a single map. StarCraft 2 is a game with 3 player types (factions): Protos, Zerg, and Terran. Each one functions very differently; each has a different set of actios by having different buildings and units. Furthermore, each match-up (e.g. Zerg vs Protos or Protos vs Protos) requires a very different strategy — as is, AlphaStar would lose terribly if it was simply matched up against a different foe! There are also at least 11 different maps which are used at tournament level, which would also require a change of strategy.

The problem this provides for AlphaStar is major. The agents learn from playing other agents, which means creating a Protos agent which can play against all other faction would also mean having to create Zerg and Terran players. To create an unconstrained agent (which could play anything against anything) agents would have to be trained for all 9 match-ups. Given the way they are currently training the system it would most likely mean training 9 times as many agents (since each agent essentially corresponds to one strategy type). It is also possible that even more agents would need to be trained to play on each map, in the worst case this might mean another 11 times as many agents. If we take this pessimistic estimate that would mean 99 times more resources: over 10 million hours of game play, 1 million TPUs and 700 million dollars.

DeepMind’s achievement is certainly great, the quality of their agents is truely impressive, but I do think the sheer amount of resources required does put things in perspective. In my back-of-the-envelope calculations, I have only considered the cost of training the final system, not any of what was used to develop the method nor the man hours used to create this amazing feat of engineering.

I hope this project can inspire us, seeing what can be achieved if we have unlimited resources. However, I also hope that we can all acknowledge that it is worth thinking about how we can make AI cheaper and more accessible to those of us that don’t have $7 million laying around to spend on computing.

--

--