Analyzing Chess Positions with Python
When playing chess, I find my games falling into similar positions. This is by design. Focusing exclusively on a couple of different openings helps improve your games, precisely because you’re seeing the same positions over and over. But I’ve been curious, more generally, what constitutes common and uncommon positions for chess pieces. I set out to analyze a year of tournament chess games with Python and chess.py in order to find out.
Link to this project’s Git repo.
An Initial Goal for Analysis
A game of chess is a sequence of board positions defined by player moves. If I were to take a chess game at random, and stop it at a random move, then the pieces are much more likely to be in some places than others. Intuitively, a king having wandered all the way across the board is less likely than it staying safely away on its own starting rank.
In chess, there are 6 pieces, 2 colors, 8 ranks, and 8 files. Multiplying these together, we get a total of 768 possible piece positions. As an initial goal, I wanted to find the probability that each of these 768 positions comes up in a random board position selected from a database of chess games.
With a goal in hand, the next step is “get a lot of data.”
Parsing Large Numbers of Chess Games
When first starting my explorations on chess positions, I wrote my own parser for portable game notation (PGN) files. If you’d like to know the details of how parsing chess notations work, check out my article on the topic. As my work expanded, I decided to use chess.py because it provides PGN parsing as well as support for exporting board positions.
I downloaded fourteen months of chess tournament data from This Week in Chess. Our workflow will be:
- Load PGN files.
- Iterate through games.
- Iterate through each move, exporting the board position.
- Transform the game format to one that can be processed numerically.
Chess.py provides out-of-the-box support for steps 1–3 with export to Forsyth-Edwards-Notation (FEN). FEN is a descriptive string listing the location of each piece on the board. For analysis, we want to transform this string into a one-hot encoding of the board. A one-hot encoding is a 768 bit vector indicating whether or not a piece position is present in the board (illustrated below).
# Find the function fentohot in the linked Git repository.import chess# Load the first game
pgn = open('twic980.pgn')
game = chess.pgn.read_game(pgn)# Iterate through the moves of the game outputting the board.
board = game.board()
for move in game.mainline_moves():
hot = fentohot(board.fen()).reshape((768,))
positions[:, iter] = hot
iter = iter + 1
In the linked Git repo, I loop over all the games in 60 PGN files. I have a total of a little more than 14.6 million board positions from chess games to analyze.
Initial Statistics and Visualization
From this data, we can find the probability that a particular piece position occurs in a random board position. From there, we can ask things like, what’s a common piece position? What’s uncommon? and from there generate some visualizations of probabilities
Excluding illegal positions (pawns on first and eighth rank), the least common position is the black king on a1 (1 in 13000 positions), followed by the white king on a8 (1 in 9800 positions). The most common piece position is black pawn on f7 (1 in 1.6 positions). The probabilities lend themselves to some nice visualization.
Where I’m Going From Here
In the next stage of my work, I’m looking at position co-occurrence. In general, I’m building towards describing entire boards from the perspective of information theory. Follow if you’d like to see more!