Teaching a language model to play chess

Travis Barton
4 min readJul 15, 2021

The game you’re seeing below is a bot playing against itself. This in itself isn’t very unique, stockfish can do the same thing and play a game worthy of a grandmaster. However, the difference between stockfish and this model is that this bot doesn’t know how to play chess. It doesn’t even know what chess is. This game was played by a generative language model called GPT-2 from OpenAI that was fine-tuned for this specific purpose. GPT-2 is meant for generating hyper-realistic text from a given corpus and is not a game engine. So how did it accomplish this task? The trick lies in serialization.

This game was chosen from 1000s of games played and is the optimum, not representative of an ‘average’ game. Usually, the bot begins to move randomly stops making strategic moves past move 30.

Generative language models are an area of natural language processing that has seen an explosion of advancement in recent years. Given any large corpus of text, models like BERT and GPT-2/3 can generate seemingly human text, similar to the input text. These have been used in some fun projects, like code summarizations, silly article creation, and poetry bots. As long as there is a text corpus these models can replicate it.

In this example, Bc4 means ‘Bishop moved to the square c4’.

This relates to chess via algebraic chess notation. You can describe a chess move with an alpha-numeric system, and a whole game with a collection of moves. There is no real difference between a chess game and a sentence from the perspective of GPT-2, so once enough games have been collected and used to fine-tune the model, the language model will pick up chess rules in the same abstract way that it picks up language rules. The idea that commas belong between clauses and bishops can only move diagonally are both rules that the model can learn through context.

This article isn’t meant to be a code demo, so I’ll limit the code examples to their bare minimum and point towards my GitHub if you want to try it on your own. Before I dive into details though I want to take a moment to shout out Max Woolf and his amazing python package gpt_2_simple as it is the cornerstone of this application. His tool makes any generative text application easy and convenient!

I obtained my data from a mixture of Kaggle challenges and various repositories of chess openings that I put together through online searches. I have aggregated those games into the ‘big_chess_set.txt’ file on my GitHub.

At its core, this bot is simple. First, I used the big_chess_set file to finetune a GPT-2 model using Max Woolf’s amazing gpt_2_simple package:

# make sure you have tensorflow 1.15
import gpt_2_simple as gpt2
sess = gpt2.start_tf_sess()gpt2.download_gpt2()sess = gpt2.start_tf_sess()gpt2.finetune(sess,
steps=10000) # steps is max number of training steps

It will take a long time to run. It took me a week to accomplish this task on my machine, and if you have several GPUs, it will run much more quickly. However, it is not necessary and since this task only needs to be run once, you can set it to run and walk away. Unfortunately due to GitHub’s size requirements, I cannot upload my models. When you’re done, you can generate moves or whole games with the commands.

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name=’lets_play_chess’, prefix=’e4') # prefix is an optional argument.

It is essential that you provide a wrapper for the chess bot, as it will sometimes try to make illegal moves (remember that this bot does not know the rules of chess and instead only really tries to make a ‘sentence’ that makes sense to its logic) My bot is allowed 10 attempts before I make it move randomly.

It is important to note two things.

  1. The bot does not play a very good game of chess. This should not be a surprise, as it has no incentive to improve, only to make chess-like sentences. There is no cost function telling the model that some moves are better than others or that illegal moves are bad, all it is doing is pattern matching, and that makes for a pretty poor player. This could be improved with either more data or more selective training, but both of those things would give pitiful improvements compared to a bot like stockfish or any model whose cost function is tied to the game’s performance.
  2. The bot performs particularly poorly once it enters the mid-game. This is not a surprise either, as the openings of a chess game are widely known and studied, but as the game deviates from a known line, the bot has fewer and fewer points of reference and tends to simply make any given legal move.

This bot will never be the next bobby fisher, but it was never supposed to play chess. This model was intended to generate text. The fact that it can be so easily repurposed to this task shows the diversity that these language models hold. As long as something can a) be serialized into text and b) can be produced with adequate examples, then there is no reason that it cannot be replicated with these language models.



Travis Barton

I am a data focused individual who likes to use my skill set to perform interesting NLP and data science tasks!