Conditions on StarCraft Supremacy and fair play for AI

Daniel Estrada
13 min readJan 31, 2019

--

A visualization of AlphaStar’s analysis and decision making in Game 2 vs MaNa. Source.

Last week, DeepMind presented a series of high level StarCraft II games from its deep learning agent AlphaStar playing against human professional gamers. This is the first demonstration of pro-level AI StarCraft play to date. A selection of games with commentary from Artosis and Rotterdam can be viewed below:

The presentation is the latest in a series of impressive performances from DeepMind, having beaten top human players in Go and top performing software in chess, shogi, and go over the last few years. Over this same period, the AI community has been eyeing StarCraft as the next game for demonstrating artificial supremacy. Several big tech firms, including Facebook, have been experimenting with the game. For anyone watching this space, DeepMind’s announcement of demonstration-worthy results was exciting, but not particularly surprising. We all knew that it was just a matter of time before this game falls, and this announcement was basically on schedule.

The broadcast showed a selection from 10 pre-recorded games between AlphaStar and two professional players, LiquidTLO and LiquidMaNa. AlphaStar won all ten games. The broadcast concluded with a single live match in which MaNa beat AlphaStar by exploiting a glitch in its play.

There has been a lot of commentary on the games themselves from the SC community, for instance WinterStarcraft’s commentary here and brownbear’s critical commentary here. Artosis and MaNa have also put their own commentaries online. There has been somewhat less commentary from the AI community (outside of DeepMind) about the significance of these games as a measure of AI progress.

In particular, there’s no clear consensus on the precise conditions for achieving “StarCraft supremacy” and officially declaring an artificial player to be the best in the world. Winning 5–0 against two top-level pros is impressive, but that’s nowhere close to sufficient for establishing a human as the top player in the world. The title for “best” comes up for dispute every year amid changes to both the game itself and changes to the “meta”, the styles that trend each season in response to successful strategies from the previous season. The best players don’t just win games or tournaments; the best players can stay on top of a changing meta for years. The person arguably considered the best in the world today is Serral, who has not only won the most recent Global Finals, but has a string of 1st place showings at major tournaments going back to 2016. And even Serral is not a close contender for greatest of all time.

To their credit, DeepMind developers have not yet claimed to have achieved StarCraft supremacy, despite the flood of clickbait headlines. DeepMind’s claims are more modest: that they’ve demonstrated a system capable of high level decision making at both a macro and micro level. Even these modest claims have been disputed by brownbear and others, who have argued that AlphaStar’s win is a product of stronger unit control rather than stronger decision making. MaNa’s win during the live broadcast clearly demonstrated AlphaStar’s weak decision making, and was a pretty embarrassing showing for DeepMind. MaNa did not outplay AlphaStar so much as exploit an unfortunate weakness in its response to normal warp prism harass. A human player making a similar mistake would have been weeded out of a high-level tournament in the very early rounds.

So it should be stressed that as of late January 2019 the status of AI Starcraft play remains unsettled. We’re currently in the brief window of time where AI can claim to be legitimately competitive with top human professionals, but where humans can still beat AI with predictable exploits and a deeper understanding of the game. This window won’t be open for very long.

In this essay, I want to discuss more precisely the conditions for closing the window and declaring an artificial agent to be superior to the best human play at StarCraft. I also want to echo brownbear’s dispute of analysis from DeepMind’s devs about AlphaStar’s play during the broadcast. I’ll end by thinking more carefully about what constitutes fair play for humans vs machines in StarCraft.

I should pause to introduce myself and my qualifications for the following analysis. I do research in AI ethics and philosophy of mind, and I write about AI and robotics actively online. I have also played a few thousand games of StarCraft II, bouncing between platinum and diamond on the ladder around 2010–13. I’ve attended live tournaments and for a while watched them regularly on Twitch. I don’t mean to signal my expertise in StarCraft (diamond league is mediocre play) so much as my familiarity with the game as a player, spectator, and community member.

Back in 2012 I wrote a blogpost titled “StarCraft 2 is brutally honest”, defending another gamer named patrick who wrote:

Starcraft 2 is brutally honest… You can study the graphs and the replays, you can watch professional streams and the Day[9] Daily and read all the forum threads you want–at pretty much no point does the game not reward you for doing an infinite amount of homework–but at the end of the day, you have to click that Find Match button again, play another game, and inevitably lose if you want to get better. As professional player Aleksey “White-Ra” Krupnik puts it, “More GG, more skill.”

There are plenty of games that are competitive. You can play Call of Duty online and get your balls e-stomped by lots of folks. The difference is that Starcraft 2 don’t fuck around. There are no teammates or lucky shots. There is no respawning. There are no unlockables or pay-to-win mechanics. The only difference between you and the guy who won is that the guy who won has trained harder and worked more so he was capable of outplaying you and sending you back to the Lose screen that helpfully reminds you that you’re ranked in the bottom 20th percentile in the world. It is cruel, almost.

I sympathized with patrick, and argued that StarCraft provided a model for both fair competition and distributed embodied intelligence. I mentioned that AI researchers had just started developing competitive StarCraft AI, and I discussed the demands of the game from the perspective of intelligently managing an attention economy. So I’ve been thinking and writing about StarCraft and AI for nearly as long as anyone in the field.

More specifically, I’ve been thinking about why StarCraft might be considered a genuinely neutral or “fair” competitive framework for evaluating AI, on par with traditional AI challenges like chess and go. On the blog I wrote:

The only way to cheat at Starcraft is to hack the technical infrastructure; otherwise, it’s all fair game… Like chess, but unlike just about every other sport, the game of Starcraft allows opponents to engage in full out, no-holds-barred competition. And in some deeply meaningful way, for patrick and thousands of other gamers, this competition is fair. Cruel, perhaps, but ultimately just.

My claim was not that StarCraft was a perfectly balanced game, since the balance might change from patch to patch. Rather, I argued that from within the game it was impossible to gain an unfair advantage. The software and map design set the limits of what is possible, and short of hacking the game those limits constrain both players alike. Within those limits, everything was permitted. Like chess, StarCraft players could be as aggressive or defensive as their pieces and nerves permitted. Ultimately, the only criterion for evaluating good play was whether it scored wins against good opponents.

As early as 1947 Alan Turing was thinking of chess as a good demonstration of machine intelligence, because it allowed for what he called “fair play for machines”. In a game like chess, the machine is interacting with the same pieces and board position according to the same rules as the human. Any strategy available to the human was in principle available to a clever enough machine. Turing experimented with his own chess-playing programs, but they were never very good. It was over forty years after Turing’s early death that IBM’s supercomputer Deep Blue, considering over 200 million moves a second, beat Garry Kasparov at chess. It took another 20 years of technological progress and theoretical innovations for AlphaGo to beat Ke Jie (and a bunch of other 9 dans) at go.

AI’ll do that to ya

These demonstrations are culturally and historically significant not only because they signal a technological revolution, but also because they represent machines “beating us at our own games”, so to speak. For thousands of years humanity has studied these games for lessons in deep thought and strategic insight. Chess and go have, to some extent, been a benchmark by which we’ve historically evaluated and ranked our own intelligence. So when the machines beat our top players at these games, to some extent it strikes a blow against humanity. We’ve selected the best representatives of our best games and they are defeated in legitimate battle by the mechanical upstarts. It doesn’t take mindless tech optimism to recognize these as epic, legendary events.

While StarCraft has been around for a fraction of the time of chess and go, in some ways it holds the same revered status in gaming communities today. StarCraft isn’t the most popular game from either a player or spectator perspective, but there’s a broad recognition across the big tent gaming community that top professional StarCraft play requires a degree of control, speed, and decision making that is intimidating even when compared to other esports. This is partly in virtue of StarCraft being one of the oldest and most professionalized games in the community.

Street Fighter has been around longer.

It is often remarked in these discussions that StarCraft is a game of imperfect information, since a fog of war hides the actions of your opponents outside your field of view. This is unlike chess and go, which are perfect information games where the opponent knows everything that is happening on the board. However, imperfect information doesn’t substantively change the “fairness” of the game, since vision works the same (and predictably) for each player, and can therefore be anticipated just like any other variable in the game. Nothing in StarCraft can be blamed on chance, despite imperfect information. Everything develops directly from the decisions and interactions of the opponents.

Given these considerations of fairness and the historic importance of human vs machine competition, the fact that AlphaStar operated on a full-map view is significant, in that it shatters the appearance of fair play. AlphaStar’s devs justified their approach by explaining that AlphaStar’s APM (actions per minute), SPM (screens per minute), and reaction times were actually lower than average professional play. Therefore, they argued, the advantage AlphaStar gains in the game can’t simply be attributed to AlphaStar’s speed. Instead, AlphaStar’s win should be attributed to its superior decision making at both the micro and macro level. But this argument stumbles in light of the live game after AlphaStar’s interface was changed to require more traditional screen management. MaNa easily defeated AlphaStar because AlphaStar was not making good unit decisions in the face of prism harass. The results suggest that AlphaStar’s advantage was not a product of good decision-making, and that DeepMind was overstating the strength of AlphaStar’s play.

I made these arguments in the comments of my Facebook thread during the broadcast last week:

Brownbear develops a similar argument in the video below.

Particularly interesting for me was how this dispute cast a new light on prior DeepMind victories. For instance, consider the famous Move 37 from game 2 of AlphaGo’s first public matches against Lee Sedol. In the clip below, the commentators initially suggest the move was a mistake, a “click-o” that in online play could justify a take-back, given the popular consensus that it was a bad move. But within the span of the clip, the commentators convince themselves that the move is actually very strong. AlphaGo eventually wins this game, and Move 37 has become mythologized as a symbol of how AlphaGo’s play is not only better than human play, but wildly counter to human intuitions about what good play looks like.

AlphaGo’s legendary move 37.

Similarly, the casters during AlphaStar’s games comment several times that the games are unlike those you’d see in professional play, especially strategies like the mass disruptors in the TLO games or mass stalkers in the games against MaNa. The devs at DeepMind repeatedly argue that AlphaStar’s strategy of over-saturating probes in the natural might be a lesson to professional players who tend to disfavor this strategy. The suggestion is that AlphaStar might be showing us superior play styles that top human players have overlooked or underestimated for whatever reasons.

This argument only works on the assumption that the game is played fairly, and that humans and machines has sufficiently equal footing within the game that strategies can be shared between us. If, in fact, either player is using techniques or abilities that are off-limits to the other for whatever reasons, the very capacity of the game to function as a fair measure of performance has been compromised. AlphaStar can afford to build an all-stalker composition against double robo immortals because it can blink its stalkers perfectly. It can afford to overmake probes or disruptors because it has enough unit control to compensate for whatever risks these mistakes engender. The machine is evaluating the risks of the game differently from a human because it is fundamentally interfacing with the game in a different way that the human.

Chimps would be fantastic at StarCraft

Some standards of fair play for competitive StarCraft

There have been plenty of words and videos produced in the last week analyzing the games in greater detail. What I want to do instead is lay out some guidelines for securing StarCraft Supremacy for AI.

  • Competitive human-machine tournaments should be restricted interface parity. Mouse and keyboard should be the only inputs from the user. Audio and the (uninterpreted) display window should be the only outputs to the user. The software should be altered as little as possible from regular tournament play.
  • Actions per minute (APM), event reaction times, mouse movement speed, eye movement speed, and other cognitive and practical limitations on average human performance should find parity in the machine. This could include forcing the machine to make some number of “misclicks” or other control mistakes at rates comparable to top human play.
  • Short of entering AI into the GSL, I think a single king-of-the-hill style tournament with a handful of top pros, on the scale of HomeStory Cup, would be the best way to establish StarCraft Supremacy. In this setting the AI would compete against multiple representatives of the best players from each race. A decisive victory with few loses at such a tournament would, in my opinion, sufficiently establish AI supremacy.
  • AlphaStar’s multi-agent learning model resulted in many different high-performing agents, each with slightly different preferences and dispositions. Multi-agent systems are fine (we’re MASs, after all), but we should consider the task of deciding which agent to use in which match to be part of the task being evaluated at a tournament. In other words, it should be an autonomous decision made during the tournament (possibly in light of viewing other matches at the tournament), and should not require the intervention of the machine’s developers.
  • AI supremacy should be available for dispute from both humans and other machines. An AI that secures a tournament victory should be available at least occasionally on the ladder for other players to challenge. This may require a temporarily combined human vs machine ladder to compliment the existing all-human ladder and the AI ladder.

These five guidelines set up conditions that I think are more in line with the norms of competitive StarCraft play, and which the community would recognize as the requirements on fair play. These guidelines are the same that any human player would commit to in a fair professional tournament. I think that under these conditions, an AI victory at a serious tournament would carry the weight and historical significance of the chess and go victories of the last few years.

It’s possible that AI takes a while to master the game, and that humans are able to competitive against machines by shifting the meta in unpredictable ways. It’s also possible that machines quickly overtake humans, but that machine vs machine StarCraft is still exciting enough to be a spectator sport in its own right. There are many possibilities for how this brief moment in history develops, when humans and AI are competitive at StarCraft. I just hope we get to enjoy it as long as possible.

It’s also possible that whether AI is competitive with humans is ultimately just a function of what we take the “average cognitive limitations on humans” to be. If we set those limitations too strictly, the AI can’t compete. If we set them too loose, humans always get crushed. In this case, the more careful mapping of our own cognitive limitations as we play StarCraft is a more compelling result of this research than a computer that plays StarCraft. In this situation, the game (and perhaps video games generally) loses their value as competitive benchmarks of performance comparing humans to machines.

In my opinion, the grand finale of human vs machine competitions will happen when robots are competitive with humans on a soccer field under normal competitive rules international soccer. If robots can keep up with the strategy and stamina of top human athletes, I’m not sure there remain meaningful benchmarks by which to signal humanity’s otherwise unique capacities for embodied intelligence. At that point we humans might as well give up the pretense of competition. We’re decades away from that goal.

Until then, StarCraft is one of the last remaining fields of friendly competition between humans and machines. After StarCraft, the generally accepted benchmarks of AI progress are more ethereal and difficult to assess competitively, like AI generating top 40 pop songs or best selling novels. Those benchmarks won’t just take innovative technologies, they’ll take radical changes to culture and to our attitude towards machines. Playing games with machines, treating machines as legitimate opponents and participants in a shared gaming culture, is how we nudge these changes along.

To quote Bruno Latour, “ the more non-humans share existence with humans, the more humane a collective is.”

--

--