Published in


Alpha Zero — The Robots can’t even Beat us like Men

This is the coldest of takes and I wouldn’t have it any other way.

In January of this year, DeepMind released its inspiring pair of StarCraft II exhibition matches — Terran on Terran, human on AI — AlphaZero, to be specific. The first match was against a mere professional player playing an off-role and the second against one of the best that ever was, playing his main role. In both cases (spoiler, but you really need to be more Internet-fearing these days), AlphaZero cleanly swept the competition. 5–0, 5–0, 10–0. But it wasn’t quite curtains for humanity, not yet. Oh no, there was a surprise exhibition game, which the human player ended up winning handily. 10–1! We cheated that time, but we were cheating all along. Or was it the AI that cheated? It’s just so hard to tell. Let’s take a closer look.

A Closer Look

It’s worth looking at the setup for the first 10 games. To ease your reading, we’ll resort to bullet points:

  • Terran v Terran
    AlphaZero — taught to play Terran against Terran and only that
    Humans — play with and against one of three races, each with different units and abilities
  • Input method
    AlphaZero — made commands via API requests, not clicks and clacks.
    Humans — clicked and clacked, of course.
  • APM capping
    AlphaZero — artificially limited to lower number of actions taken per minute
    Humans — only limited by their meat-imposed speed limits, which tends to average 2–3x what AlphaZero’s cap was set to.
  • Reaction time capping
    AlphaZero — made to wait nearly half a second before reacting to anything on the screen
    Humans — reacted in about a tenth of a second.
  • Omnisight
    AlphaZero — used hawk eyes; didn’t have to zoom in or pan to see and act
    Humans — were of course limited to focusing on very small areas. (However, even though humans can only see one thing at a time, they can monitor movements and colors in their peripheries.)
  • Multitasking
    AlphaZero — can do many things at once without a hitch.
    Humans — did their best slow-motion DC’s The Flash cloning impression, which is to say that they shuffled back and forth slowly to effect the illusion of being in two places at once
  • Multiple personalities
    AlphaZero — is actually a pile of different AI winners chosen from a Mad Max Thunderdome-like fighting pit. Every game was played by a different AlphaZero winner.
    Humans — just two, though they had the ability to leverage games from the series for on-the-spot adjustments
  • Aim
    AlphaZero — has perfect aim
    Humans — missed some unacceptably human percentage of the time
  • Focus
    AlphaZero — could do anything that isn’t playing StarCraft matches
    Humans — were 50% likely to have Brittney Spears’ Toxic running through their heads at any given point in time.
  • Winning mentality
    AlphaZero — bred and studded and selected winners (more on that below). Also, cheats and doesn’t feel bad about it.
    Humans — love defeat, or it would appear so, on account of the results. Also, would cheat if it could get away with it.

So it’s clear that AlphaZero has just about everything working in its favor here. It gets super-sight, super-precision, limited matchups, random but battle-hardened, grizzled veteran strategies — everything except the APM nd reaction capping is great news for the eggheads’ monster. And there is of course human ingenuity and heart, which it is sorely lacking.

That reads like sarcasm, but it isn’t. If you watch the games, it’s clear that AlphaZero is a soulless monster devoid of many of our favorite human traits, foibles as they may be. AlphaZero is fragile and full of holes, but really good at what it does better than us. And it turns out that’s probably good enough and we should get used to losing to it and its soulless friends.

I was Promised a Rant about Fairness

AIs — and more broadly, machines — have long been recognized as superior but stupid but ultimately useful laborers, upon which we’ll burden everything we don’t want to or could never do. But it’s taking a lot longer than our fanciful scribes have imagined, largely because engineering generalizable things is a grind.

Modern AI is finally cracking some of the generalization code, but we could well be a long way from something remotely as versatile as the average 2-year-old.

Unfortunately, we’re really damn optimistic about some of our breakthroughs and have started to over-correct when things don’t happen as quickly or as well as we’d like. Instead of being amazed that an AI could actually win a game of StarCraft II against a professional, something that is beyond Go levels of hard, we’re annoyed that it cheated.

Yes, it cheated. It’s bloody perfect at aiming and it doesn’t get tired and it can switch personalities like Arya can slap on new faces. And it’s playing literal hundreds of years’ worth of matches in days and sees and knows all.

But we limited its action rate and reaction times!

Humans mostly make adjustment actions. Effective actions (eAPM) are significantly lower for humans — under half that of APMs. Taking this into consideration, AlphaZero was faster than any human has ever been. As for reaction times, while it is slow, being able to react to thousands of things at once far more than makes up for the lag.

Let’s do it again, but make it (more) fair

This might be an interesting exercise and I’m sure this will happen, but it’s worth making a broader point here about fairness.

AI will never be fair unless we make it fair. And generally speaking, we probably shouldn’t bother making it fair unless we’re trying to learn something from it or make it function as a proxy for human behavior. Our machines will be better at us at whatever we train them to do, which is why we’re doing this whole thing in the first place. It seems obvious when said that way, but we do seem to waste a whole lot of wind talking about fairness when discussing the impending and ongoing robotic replacement revolution.

I imagine some of this comes from our beautiful but silly need to anthropomorphize every living and unliving thing that creepeth upon the earth. Since robots are going to be acting like us, we’ll be helplessly bound to fit their actions within our rules. But the apparent truth behind this all is that the forces animating our robots are other humans and the market forces driven by their actions. The man behind the curtain has and always will be the same kind of man in front of the curtain.

But what about AI alignment?

Still important, so long as we’re clear that we’re aligning the AI with our interest, not our actual behavior. We’re not trying to make the AI behave exactly as we would behave — or at least we shouldn’t. That would be enormously stupid, to put it kindly. If we want an AI to behave as we would, we don’t need an AI for that, except in menial labor. If we want an AI to behave as we would if we had AI powers like AlphaZero, it’s unlikely we’ll approve of its actions.

It’s easier to evaluate an outcome than it is to mathematically reproduce the complexity insanity you find in a concept like equality or justice. We know a good dog when we see it, at least some of the time. This will get more difficult as we go along.

AlphaZero even hinted at this kind of thing just seconds into the first game. Normally, players max out their low-level factory worker to the point where there’s a precipitous diminishing returns drop-off. Adding one isn’t worth the cost of the new unit. Or so it seems when you look at the numbers. But the AI may have actually stumbled upon something almost approximating wisdom. Games are often lost by blitzes on the low-level workers to stunt the economy and draw away player attention, but throwing a few extra warm factory worker bodies on the field greatly protects against that strategy. The commentators were baffled, as were the players, though not for long — it made sense once everyone saw it.

A few more observations in this, the coldest of takes

Another bit of intrigue was how DeepMind went about training AlphaZero’s competitors. As mentioned above, they created a gauntlet of agents and tasked them thrashing a whole bunch of other agents. Bloody and unbroken, a few victors emerged. And the results against humans proved them winners. Even with their built-in advantages, they still had to beat really good humans. Surprised and confused humans, but amazing players.

That said, if you watch each game, you notice some strange things are afoot. For one, the AI, can be baited into stupid plays. It acts clueless when it’s not executing its strategies. This is most apparent in the game it lost, where the human player managed to get in a good base attack and AlphaZero almost literally reacted by spinning in circles and feeding itself into a grinder. Which begs the question: why? Since you’re begging to know, I think the answer lies in how the AI is trained — to win. Full stop. Nothing else.

We won’t go full game theory here, but one of the limitations of the game theory model is that it represents a 1 to 1 back and forth, where a single player wins at the expense of the losing player. The zero-sum game. And StarCraft II is on a lot of levels a zero-sum game at heart. You get resources and grind the other player out of contention. But just like micro and macro-economics, the ideal strategy for winning at a game is figuring out how to connect the micro and macro worlds. (And to extend the comparison, the same holds true for understanding humans.)

At a macro level, AlphaZero learned how to beat other AlphaZero bots using a consistently useful strategy against an unknown opponent. DeepMind hid the weakness of its agents by creating a different one for each game. Were the human player allowed to rematch against the same agent, that person would win every time, because the agents are one-trick robot ponies. And the fact that only generally useful strategies could be winners in the robot thunderdome means that novel and interesting and contextually superior strategies are unlikely to emerge. (Though the flipside is that every agent that emerges unscathed could be telling you some objective truth about winning strategies in StarCraft, which is fascinating in and of itself.) Lastly, AlphaZero’s macro play was limited by its understanding of micro play. Since it’s perfect at certain things and completely incompetent at others, many of its strategies were biased by playing other bots with the same traits. It knows its own strengths, which are bizarre compared to human ones. And it assumes that humans have the same bizarre skills.

At a micro level, AlphaZero is a strange beast. As mentioned before, it has perfect aim. It does not make pointless moves. And it can be in a million places at once. And this is with the shackles on. We could turn its eAPM to 10,000 and suddenly the macro world wouldn’t matter all that much because it’s so colossally good at everything on the ground that superior human strategists couldn’t do enough to compensate. To use a fun pop sci-fi reference, it’s similar to the fights in Ender’s Game, except with role reversals. We would be the Hive Queen and the AI would be the humans. And (sorry, dammit but I warned you about the Internet and spoilers) humans end up handily winning all the fair fights because they could split their attention. Now imagine the humans are also perfect at aiming and react instantly to the enemy and to commands, and you’re starting to see why this part will never be fair.

At the end of the day, what DeepMind wants out of the StarCraft version of AlphaZero is not to show off how much better AI is at micro tasks, but that it is capable of macro strategies that are intelligent in the way we think about intelligence. I think it did an okay job of demonstrating that, though we’ve uncovered many ways in which it’s clear we’re not really getting a fair fight in the end. And I’m not bothered by that part, though I wish it were made more clear by DeepMind. AI is going to be better than us at games and at money and at labor and so on — we’re not even sure there is something it can’t be better than us at. And that includes love, because we all know robots are going to be amazing lovers.

We’re the result of undirected mutations over a long period of time. It’s a secular miracle we exist at all and that we can build something like AI. But AI will never be us and we need to be comfortable with that. I think it’s going to be extremely difficult with regard to our intuitions. We have hardware limitations and struggle mightily within our own species. AI is far stranger than our intuitions are prepared to cope with, so we might be stuck trying to fit it into our metaphors for the foreseeable future.

Last note: I should take a second to recognize just how cool AlphaZero’s StarCraft demonstration is. It’s awesome. I’m eagerly awaiting the next iteration and you should be too. If you’re not, maybe I’ll write about why you should be and then you will be.

Thanks for reading! :^)



Wherein we discuss the most dangerous animal and the universe in her head.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joshua Clingo

Hello, this is me. So who is me? Me is a Cognitive Scientist who happens to like writing. I study meaning in life, happiness, and so on and so forth, forever.