How AlphaZero Learns Chess?

Gayan Samuditha
Expo-MAS
Published in
7 min readDec 11, 2021

DeepMind and Google Brain researchers and former World Chess Champion Vladimir Kramnik explore how human knowledge is acquired and how chess concepts are represented in the AlphaZero neural network via concept probing, behavioral analysis, and an examination of its activations.

***********************************************

The world has quietly crowned a new chess champion. While it has now been over two decades since a human has been honored with that title, the latest victor represents a breakthrough in another significant way: It’s an algorithm that can be generalized to other learning tasks.

It gets crazier. AlphaZero, the new reigning champion, acquired all its chess know-how in a mere four hours. AlphaZero is almost as different from its fellow AI chess competitors as Deep Blue was from Gary Kasparov, back when the latter first faced off against a supercomputer in 1996. And what’s more, AlphaZero stands to upend not merely the world of chess, but the whole realm of strategic decision-making. If that doesn’t give you pause, it probably should.

========================================

Deep neural networks are known to learn opaque, uninterpretable representations that lie beyond the grasp of human understanding. As such, from both scientific and practical viewpoints, it is intriguing to explore what is actually being learned and how in the case of superhuman self-taught neural network agents such as AlphaZero.

In the new paper **Acquisition of Chess Knowledge in AlphaZero,** DeepMind and Google Brain researchers and former World Chess Champion Vladimir Kramnik explore how and to what extent human knowledge is acquired by AlphaZero and how chess concepts are represented in its network. They do this via comprehensive concept probing, behavioral analysis, and examination of AlphaZero’s activations.

RESEARCH :

- Acquisition of Chess Knowledge in AlphaZero -

ABSTRACT

What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work, we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network. We also provide a behavioral analysis focusing on the opening play, including qualitative analysis from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary investigation looking at the low-level details of AlphaZero’s representations and make the resulting behavioral and representational analyses available online.

Download here: arXiv.

The team aims their study at an improved understanding of:

  1. Encoding of human knowledge.
  2. Acquisition of knowledge during training.
  3. Reinterpreting the value function via the encoded chess concepts.
  4. Comparison of AlphaZero’s evolution to that of human history.
  5. Evolution of AlphaZero’s candidate move preferences.
  6. Proof of concept towards unsupervised concept discovery.

******************************************************************

The researchers premise their study with the idea that if the representations of strong neural networks like AlphaZero bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

The team detects human concepts from network activations on a large dataset of inputs, probing every concept at every block and over many checkpoints during AlphaZero’s chess self-play training process. This enables them to build up a picture of what is learned when it was learned during training and wherein the network it is computed.

What-when-where plots for a selection of Stockfish 8 and custom concepts. Following Figure 1, we count a ResNet ‘block’ as a layer.

What-when-where plots for a selection of Stockfish 8 and custom concepts. Following Figure, we count a ResNet ‘block’ as a layer.

Evidence of patterns in regression residuals

The team examines how chess knowledge is progressively acquired and represented using a sparse linear probing methodology to identify how AlphaZero represents a wide range of human chess concepts. They visualize this acquisition of conceptual knowledge by illustrating what concept is learned when in training time and wherein the network in “what-when-where plots.”

Following the study of how internal representations change over time, the team then investigates how these changing representations give rise to changing behaviors by measuring changes in move probability on a curated set of chess positions; and by comparing the evolution during self-play training to the evolution of move choices in top-level human play.

AlphaZero’s activations used to predict human concepts

Finally, given the established AlphaZero’s activations used to predict human concepts, the team investigates these activations directly by using non-negative matrix factorization (NMF) to decompose AlphaZero’s representations into multiple factors to obtain a complementary view on what is being computed by the AlphaZero network.

++++++++++++++++++++++++++++++++++++++++

This is indeed what we see in the following figure, which compares human history against AlphaZero’s historical preferences during training.

If different versions of AlphaZero are trained, the resulting chess players can have different preferences. There are versions of AlphaZero that will play the Berlin defense to the Ruy Lopez, but other versions of AlphaZero will prefer the equally good classical response, a6. It is interesting as it means that there is no “unique” good chess player! The following table shows the preferences of four different AlphaZero neural networks:

***************** ?

The AlphaZero prior network preferences after 1. e4 e5 2. Nf3 Nc6 3. Bb5, for four different training runs of the system (four different versions of AlphaZero). The prior is given after 1 million training steps. Sometimes AlphaZero converges to become a player that prefers 3… a6, and sometimes AlphaZero converges to become a player that prefers to respond with 3… Nf6.

  • *************************************************************

But How Really ALpha zEro thinks ?

#How does AlphaZero evaluate positions?

AlphaZero’s neural network evaluation function doesn’t have the same level of structure as Stockfish’s evaluation function: the Stockfish function breaks down a position into a range of concepts (for example king safety, mobility, and material) and combines these concepts to reach an overall evaluation of the position. AlphaZero, on the other hand, outputs a value function ranging from -1 (defeat is certain) to +1 (victory is guaranteed) with no explicitly-stated intermediate steps. Although the neural network evaluation function is computing something, it’s not clear what. In order to get some idea of what’s being computed, the DeepMind and Google Brain researchers used the Stockfish concept values to try to predict AlphaZero’s position evaluation function (similarly to the way piece values can be obtained by predicting a game’s outcome).

This approach allowed the researchers to get estimates of what AlphaZero values in a position, and how this evaluation evolved as self-play training progressed. As the figure above shows, material emerges early as an important factor in AlphaZero’s position evaluation but decreases in importance later in training as more sophisticated concepts such as king safety rise in importance. This evolution is surprisingly human-like: early in the process of learning chess, we evaluate positions simply by counting pieces, before getting a richer understanding of other aspects of a position as we learn more. Interestingly, the rapid jump in the understanding of the importance of material around 32,000 training steps matches the point in training at which opening theory begins to evolve, suggesting that this is a critical period for AlphaZero’s understanding of chess.

CONCLUSION:

By the way, the team’s study of the progression of AlphaZero’s neural network from initialization until the end of training yields the following insights: 1) Many human concepts can be found in the AlphaZero network; 2) A detailed picture of knowledge acquisition during training emerges via the “what-when-where plots”; 3) The use of concepts and relative concept value over time evolves, AlphaZero initially focuses primarily on material, with more complex and subtle concepts emerging as important predictors of the value function only relatively late in training; 4) Comparison to historical human play reveals that there are notable differences in how human play has developed, but also striking similarities with regard to the evolution of AlphaZero’s self-play policy.

--

--

Gayan Samuditha
Expo-MAS

Software Engineer , Biologist, Techie, Try to Save the Human Being with Combination of Medical Informatics and AI