Reconstructing chess positions
In this post I will investigate how the board representation can be stored in limited space and then reconstructed. I will use a tensor representation of the board and autoencoders to experiment with different types of positions. I will compare positions from real chess games with positions generated with random positions for the pieces.
The booming growth in the Machine Learning industry has brought renewed interest in people about Artificial…go.datadriveninvestor.com
Beginning in the sixties it was studied how chess layers “see” a chess position. De Groot  found that playing strength is not related to the number of moves a player can calculate, but stronger players are good at finding and calculate the right moves. In interesting experiments De Groot and Chase and Simon  found difference between masters and weaker players in memory experiments. Masters have the ability to reconstruct a chess position almost perfectly after viewing it for only 5 sec. But this is only true if the position comes from a real chess game and not if it was constructed by placing some pieces randomly at the board.
In the recent past, major advances in computer chess were made with self learning chess engines like Silver et al.,  and Lai, , which learned to evaluate positions and select moves by auto-play. This programs learn to find good representations and evaluations of chess positions.
In the domain of text analytics so called “word2vec”s (Mikolov et al., ) are very useful in the presentation of the semantic meaning of words as vectors. We try to find good representations for chess boards as vectors.
Representations of chess boards
David et al.,  used “pos2vec” representations in their supervised approach “DeepChess”. They used auto-encoders with dense layers to represent two board positions before training a fully connected net to distinguish between good and bad chess position. Each position was represented by a binary string of length 773. The auto-encoder compressed the representation of the board to a string of length 100 using several fully connected layers. It was the first end-to-end machine learning chess engine reaching grandmaster-level chess.
Silver et al.,  used a 8x8x119 tensor representation of the board, where 6 planes were used for each type of piece (with 0 and 1) for white and black, an 8 step history and additional 9 planes for information like side to move, castling rights, repetitions and move counts.
In my experiments I use a simplified version of this tensor representation as I use only the actual position and one additional plane for the side to move. The planes for the pieces are stored as 1 for white and -1 for black. So I get a 8x8x7 tensor. I do not take care about castling rights, repetitions and move counts in this experiments.
This function converts an array of chess positions in FEN notation to a tensor as described above.
For the chess positions from real chess games I used 29.169 games played on Free Internet Chess Server (FICS) by players with average ELO rating greater 2000 in January 2018. I downloaded the games as PGN-file from the FICS Games Database.
Each position in the games was labeled with its FEN description with the “pgn-extract” command line tool. This gives a set of 2,253,350 positions which I divide into a trainings set of 2,048,000 and a test set of 205,350 positions. You can download the data from:
For the dataset of random chess positions I used a modified version of a script to generate them in FEN format . The positions of the pieces on the board are not completely random they follow some rules to have almost legal chess positions:
- there is one and only one king of each color
- the kings must not be placed on adjacent squares
- there can not be any pawn in the promotion square
- including the kings, up to 32 pieces of either color can be placed
I generate a set of random positions of the same number as the set of positions from the real games.
I tried two different architectures of deep neural nets, one with some dense layers were I flattened the position tensor to a list of 448 values and then reduced the number of neurons layer by layer.
The following model was used:
- flat input of size 448
- stacked dense layers with 300, 200, 150 and 42 neurons per layer for encoding
- stacked dense layers with 150, 200, 300 and 448 neurons per layer for decoding
- “tanh” as activation function
- the model has 463,190 trainable parameters
As a second model I built a more complex deep convolutional net:
- an input of size (8,8,7)
- 3 two-dimensional convolution layers with max-pooling and batch normalization for encoding
- 3 two-dimensional convolution layers with up-sampling and batch normalization for decoding
- filter size of (3,3) for convolution
- strides of (2,2) for pooling
- 64*7, 32*7, 16*7 filters for encoding
- 16*7, 32*7, 32*7 filters for decoding
- same padding for all layers
- “tanh” as activation function
- the model has 2,428,839 trainable parameters
As loss function mean squared error is used in both cases, which was optimized with the AMS-Grad variant (Reddi et al., ) of “Adam” optimizer and a batch size of 128. With the parameters lr=0.001, beta1=0.9, beta2=0.999, epsilon=None, decay=0.0.
To see how the training works, we plot the learning curves for the training and validation data.
To test the model I encoded and reconstructed all positions in the test set and counted the number of errors made. For example the position:
was reconstructed to
with two errors of missing pieces. The figures were plotted with the Python chess package.
First we need a helper function to convert a position in tensor form back to FEN notation:
We need the encoder part of the trained autoencoder model:
And the decoder:
For the whole testset (“testinput”) I encode and decode the positions and count the number of errors in the decoded positions compared to the original positions.
We trained and evaluated the two models with the datasets (real and random positions). After 100 epochs of training the dense model with an encoding dimension of 42 on the dataset of real chess positions reached an mean error rate in the reconstructed positions of 2.8 and the histogram of errors is given by:
On the dataset of random positions the model with encoding dimension 42 reached a mean error rate of 15.6.
It seems to be more difficult for the autoencoder to find good embeddings for random positions and easier to learn the patterns and features of real positions.
We varied the size of the encodings and looked at the mean error rate for real and random positions:
For small encoding dimensions it is much easier for a autoencoder to reconstruct positions from real games than random positions. It seems that the encoder can use the typical patterns in real positions for a better compression of the data.
 Adriaan D De Groot. 2014. Thought and choice in chess, volume 4. Walter de Gruyter GmbH & Co KG.
 William G Chase and Herbert A Simon. 1973. Perception in chess. Cognitive psychology 4(1):55–81.
 David Silver, et al. 2017. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815
 Matthew Lai. 2015. Giraffe: Using deep reinforcement learning to play chess. arXiv preprint arXiv:1509.01549
 Tomas Mikolov, et al. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. pages 3111–3119
 Omid E David, et al. 2016. Deepchess: End-to-end deep neural network for automatic learning in chess. In International Conference on Artificial Neural Networks. Springer, pages 88–96.
 Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the convergence of adam and beyond. In International Conference on Learning Representations.