Lessons from Alphago: Storytelling, bias and program management

Over the past few days, Alphago has taken the world by storm once again. Over a week in Wuzhen, it beat the worlds’ best player Ke Jie three times, a team of players from China, and finally lost a game (unavoidable, since it played against itself in a human pair-go match). While creating Alphago is a great feat in itself, the results of the games and underlying techniques are already covered in depth elsewhere. But not too much has been said yet about what Alphago can teach human go players, and what its emergence as the best go player ever tells us about human go.

One of the more interesting phenomena in the set of matches that Alphago played , starting from the games against Fan Hui (late 2015) and Lee Sedol (2016), is how it started to look like Alphago was playing like human professional player, only a little better. Even after the first set of self-play games were released a year or so ago, the games looked surprisingly human.

In the most recent set of matches against Ke Jie, the games themselves really weren’t that exciting. It just seemed too unlikely that Ke Jie would have a chance to win.

In fact, the most interesting reveal happened only after the match, and that is when DeepMind released the first set of self-play games where Alphago played itself (similar to how it is trained in order to improved the AI). Those games were surprisingly non-human, so much so that it is not clear at a glance if the average human go player can learn anything from them. Professional go players will certainly be able to learn something, but even among professionals it’s not yet clear what impact this will have on the level of tournament play.

Storytelling

When playing go, human players need a way to rationalize why they play where they did. Intuition is a key element to help distill the complexity to something more manageable, even for a professional player. Michael Redmond explained this well in one of the commentaries in the Alphago games: “It plays good moves, with a meaning, so we can go back and analyze and explain the meaning behind the moves.”

Ke Jie (Black) vs Alphago, Game 1: The beginning of a story we can rationalize. While most human go players would not be able to find many of these moves, they all fit together in natural sequences and exchanges, such as the upper left part of the board highlighted here, where white takes the corner and black captures four stones on the outside.

The issue is, we know now that the Alphago moves are stronger than the moves of any human player. In fact, over all recently released games from Alphago, the only moves we have seen that could be classified as mistakes are mis-clicks from its human operators. So, the explanations we create for the moves are often not able to truly capture the depth of analysis that lies behind the move, even if we can explain an “intent” behind the move (an emergent behavior), we are often not able to truly understand it.

Just like a story is built of paragraphs and chapters that come together to form a whole, a game of go with a human master (or two) will often look like a masterpiece. Like a written story, we can follow the plot, relate to the way the stones are placed across the move and identify patterns. Part of this is related to context. Human go players rely on the corpus of go games built up over 100s of years, and the “written corpus” (games that have been recorded officially), contains at most on the order of one million games. Similar to an author studying other master writers, a human player is limited in how many games he can study during the course of its lifetime.

Alphago vs. Alphago, Game 1: The beginning of a story we can’t yet fully understand. While it’s possible to rationalize each move, the way the stones fit together on the board does not form a fully coherent story. Looking at white’s moves, the stones are too far away from each other for a human player to understand how the stones will coordinate, or to analyze the other options for black to respond to the moves.

Bias

Human players need a way to explain how their moves fit together. That leads to the emergence of a number of fixed sequences in the games. Professional players know that the fixed sequences just represent one of many ways to play, but ultimately, most professional games contain fixed sequences (called joseki, or fixed sequences) which represent the collective human go knowledge accumulated over hundreds of years.

Human bias. The sequence from 1 to 6 was extremely popular even in professional go, up until 10 or so years ago. Human players found and identified move 5 as slack (not putting enough pressure on the opponent). Alphago rarely or never plays this sequence.

The fact that the games are built up around fixed sequences also creates a bias, seen clearly among most amateur players, who will tend to imitate the sequences, but seen almost as clearly among pros (in hindsight, when we have seen the games Alphago plays).

Alphago vs Alphago, Game 8: Exposing a human bias. It was assumed to be bad to invade on the 3–3 point early in the game, as it allows the opponent to build up a strong outward-facing position. Alphago showed that human players had got this wrong.

From AI research perspective, the apparent lack of human bias is one of the most interesting results, since it shows that DeepMind was able to bootstrap their program with human data, but then gradually remove the biases present in human play. It’s impossible to know how well they have succeeded in that, but surpassing human level to the extent they have done is already a huge step.

Program management

DeepMind has made a PR success out of Alphago, despite the lack of Chinese media coverage in the last game series against Ke Jie. Articles in all top newspapers and magazines. 25000+ players watching the official broadcasts of the most recent set of games, and probably much more in the earlier round against Lee Sedol.

It’s interesting to see how DeepMind fully exploited the characteristics of their program, and the heritage of go in Asia when setting up these matches. Setting up matches against the top players in Korea and China allowed the human players to be part of the story that was being written. In turn, it made it feasible for other human professional players to comment on the games and to understand the emergent intelligence from Alphago one step at a time. Arranging the games within the system of professional go generated interest among the go associations and community in China, Korea and Japan.

At the same time, there are good reasons why DeepMind did not release more games from the self-play by the most recent version of Alphago earlier. It would have been a too early reveal, leading to decreased interest in any future matches, like a spoiler for the ending of a movie when you’re in the middle of watching it.

When looking at the set of games that Alphago has released now in the context of the previous games against human players, it’s obvious that Alphago is much stronger than the best human players today. And, as is true for any AI, it doesn’t get tired (although it might still make mistakes for other reasons, such as in the first round of games against Lee Sedol). Having seen the first set of games, my guess is that Alphago would not lose a single game in 1000 against any of the best human players today.

It’s also interesting to think about the complexity of DeepMind releasing a version of Alphago to the public (something they have said they will not do). There are two key factors contributing to this: (1) DeepMind wants to develop general AI as their mission, and not for go alone; (2) While DeepMind may want to reveal their general approach to spur interest in the community (and interest in Google, TensorFlow and working for DeepMind), they don’t want to reveal their secret sauce. The implication is that if Google were to release Alphago to the public, they would be stuck with the maintenance costs (both human costs and machine costs) for several years ahead, and with the go community asking for additional features and support.

By releasing a teaching tool (e.g. one that analyzes and comments all recent professional games), as they have promised to do, they are contributing to the go community within a more narrow scope which won’t require much work beyond the work they have put in to build Alphago already. And, the techniques they have released that describe at a high level what Alphago does will lead to the emergence of a number of go programs stronger than the best human players, probably already in the next 1–2 years, while not taking away the advantage DeepMind has in General AI.

It will be exciting to see what next steps DeepMind with Alphago, and how this success will contribute to the level of human go!