Reconstructing chess positions

In December 2018 I started a series of articles about chess programming showing how to write a simple program and how to improve its performance.

In this post I will investigate how the board representation can be stored in limited space and then reconstructed. I will use a tensor representation of the board and autoencoders to experiment with different types of positions. I will compare positions from real chess games with positions generated with random positions for the pieces.

Motivation

Beginning in the sixties it was studied how chess layers “see” a chess position. De Groot [1] found that playing strength is not related to the number of moves a player can calculate, but stronger players are good at finding and calculate the right moves. In interesting experiments De Groot and Chase and Simon [2] found difference between masters and weaker players in memory experiments. Masters have the ability to reconstruct a chess position almost perfectly after viewing it for only 5 sec. But this is only true if the position comes from a real chess game and not if it was constructed by placing some pieces randomly at the board.

In the recent past, major advances in computer chess were made with self learning chess engines like Silver et al., [3] and Lai, [4], which learned to evaluate positions and select moves by auto-play. This programs learn to find good representations and evaluations of chess positions.

In the domain of text analytics so called “word2vec”s (Mikolov et al., [5]) are very useful in the presentation of the semantic meaning of words as vectors. We try to find good representations for chess boards as vectors.

Representations of chess boards

David et al., [6] used “pos2vec” representations in their supervised approach “DeepChess”. They used auto-encoders with dense layers to represent two board positions before training a fully connected net to distinguish between good and bad chess position. Each position was represented by a binary string of length 773. The auto-encoder compressed the representation of the board to a string of length 100 using several fully connected layers. It was the first end-to-end machine learning chess engine reaching grandmaster-level chess.

Silver et al., [3] used a 8x8x119 tensor representation of the board, where 6 planes were used for each type of piece (with 0 and 1) for white and black, an 8 step history and additional 9 planes for information like side to move, castling rights, repetitions and move counts.

In my experiments I use a simplified version of this tensor representation as I use only the actual position and one additional plane for the side to move. The planes for the pieces are stored as 1 for white and -1 for black. So I get a 8x8x7 tensor. I do not take care about castling rights, repetitions and move counts in this experiments.

This function converts an array of chess positions in FEN notation to a tensor as described above.

Datasets

For the chess positions from real chess games I used 29.169 games played on Free Internet Chess Server (FICS) by players with average ELO rating greater 2000 in January 2018. I downloaded the games as PGN-file from the FICS Games Database.

Each position in the games was labeled with its FEN description with the “pgn-extract” command line tool. This gives a set of 2,253,350 positions which I divide into a trainings set of 2,048,000 and a test set of 205,350 positions. You can download the data from:

For the dataset of random chess positions I used a modified version of a script to generate them in FEN format . The positions of the pieces on the board are not completely random they follow some rules to have almost legal chess positions:

  • there is one and only one king of each color
  • the kings must not be placed on adjacent squares
  • there can not be any pawn in the promotion square
  • including the kings, up to 32 pieces of either color can be placed

I generate a set of random positions of the same number as the set of positions from the real games.

FEN of random positions
Example of a random position

The Models

To find the vectors which represent the chess positions I use autoencoders and train them with the position data. The models were built with Keras on top of Tensorflow.

I tried two different architectures of deep neural nets, one with some dense layers were I flattened the position tensor to a list of 448 values and then reduced the number of neurons layer by layer.

The following model was used:

  • flat input of size 448
  • stacked dense layers with 300, 200, 150 and 42 neurons per layer for encoding
  • stacked dense layers with 150, 200, 300 and 448 neurons per layer for decoding
  • “tanh” as activation function
  • the model has 463,190 trainable parameters

As a second model I built a more complex deep convolutional net:

  • an input of size (8,8,7)
  • 3 two-dimensional convolution layers with max-pooling and batch normalization for encoding
  • 3 two-dimensional convolution layers with up-sampling and batch normalization for decoding
  • filter size of (3,3) for convolution
  • strides of (2,2) for pooling
  • 64*7, 32*7, 16*7 filters for encoding
  • 16*7, 32*7, 32*7 filters for decoding
  • same padding for all layers
  • “tanh” as activation function
  • the model has 2,428,839 trainable parameters

Training

As loss function mean squared error is used in both cases, which was optimized with the AMS-Grad variant (Reddi et al., [7]) of “Adam” optimizer and a batch size of 128. With the parameters lr=0.001, beta1=0.9, beta2=0.999, epsilon=None, decay=0.0.

To see how the training works, we plot the learning curves for the training and validation data.

Testing

To test the model I encoded and reconstructed all positions in the test set and counted the number of errors made. For example the position:

Original position

was reconstructed to

reconstructed position

with two errors of missing pieces. The figures were plotted with the Python chess package.

First we need a helper function to convert a position in tensor form back to FEN notation:

We need the encoder part of the trained autoencoder model:

And the decoder:

For the whole testset (“testinput”) I encode and decode the positions and count the number of errors in the decoded positions compared to the original positions.

Results

We trained and evaluated the two models with the datasets (real and random positions). After 100 epochs of training the dense model with an encoding dimension of 42 on the dataset of real chess positions reached an mean error rate in the reconstructed positions of 2.8 and the histogram of errors is given by:

On the dataset of random positions the model with encoding dimension 42 reached a mean error rate of 15.6.

It seems to be more difficult for the autoencoder to find good embeddings for random positions and easier to learn the patterns and features of real positions.

We varied the size of the encodings and looked at the mean error rate for real and random positions:

For small encoding dimensions it is much easier for a autoencoder to reconstruct positions from real games than random positions. It seems that the encoder can use the typical patterns in real positions for a better compression of the data.

References

[1] Adriaan D De Groot. 2014. Thought and choice in chess, volume 4. Walter de Gruyter GmbH & Co KG.

[2] William G Chase and Herbert A Simon. 1973. Perception in chess. Cognitive psychology 4(1):55–81.

[3] David Silver, et al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815

[4] Matthew Lai. 2015. Giraffe: Using deep reinforcement learning to play chess. arXiv preprint arXiv:1509.01549

[5] Tomas Mikolov, et al. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. pages 3111–3119

[6] Omid E David, et al. 2016. Deepchess: End-to-end deep neural network for automatic learning in chess. In International Conference on Artificial Neural Networks. Springer, pages 88–96.

[7] Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the convergence of adam and beyond. In International Conference on Learning Representations.