ChatGPT Playing Chess Independently— From Text To Actions

6 min readJul 6, 2024

AGI is not crushing modern chess engines at the time this story is written. However, if they reach such a level of thinking, do we still need someone to make the actual moves for it on the board?

After some experience with LLMs, they provide valuable help with some tasks. Coding and writing for example. All the tasks it is great at still need a human to add the last touch to get the actual benefits of its work. Here we are going a little further and execute actions from the LLM outputs, and we use Chess to test it out.

Language is ambiguous, Text is precise

Humans are great at interpreting language. Large Language Models are trying to mimic this and are getting good at it too. Humans can turn natural language inputs easily into actions. LLMs however need an additional neural network trained specifically for the desired tasks to turn the “set of tokens” into actions. Here we are skipping the NN part and using text parsing and traditional programming to generate actions.

Everyone who have used LLMs (like ChatGPT) knows that sometimes it is really hard to get the exact answer from the model you wanted. Sometimes it works 50% of the time and sometimes even less. Additionally, if you have experience in programming, you know that no matter if the problem is big or small, the result is the same, and the program crashes. These features together create a problematic system, but I have found two silver bullets for this, prompt engineering and a feedback loop.

Prompt Engineering

Consider a case where your code is asking LLM if a password the user inserted looks secure:

If you use the LLM answer in your code, the answer itself cannot be used to approve the input. You could write something like

if "yes" in answer:
  print("Password approved!")
else:
  print("Password is not secure!")

However, it is possible that the answer contains “yes” somewhere even when the actual response to the question was “no”. A slight change in the prompt may fix the problem:

Now your code is checking for

if answer == "No":
  print("Password is not secure!")
else:
  print("Password approved!")

And it still fails because this time the model decided to add a comma in the answer. I have found that the best way is to ask the model to embed the answer inside predefined characters. Then, if the model decides to add explanations in the answer, the parsed answer should still work.

Even with this, it is not guaranteed to work all the time, and this is why we need a backup system.

Feedback loop

You can never trust your AI response 100%, even with the perfect prompt engineering and testing. This is why we need to validate the answer and give the AI a chance to correct itself. Here is a sample to do so:

def get_answer(prompt):
  for i in range(10):
    answer = ask_llm(prompt)
    try:
      answer = answer.split("||")[1]
      if answer=="Yes" or answer=="No":
        return answer
    except:
      pass
  return False

This gives the model a chance, or actually 10 chances, to correct itself. If the task you want to do is more complex, you can even add generic error handling for your task, and send the error message for the model together with your original prompt to increase the chances to get it right.

Chess is hard

Answering “Yes” and “No” is not enough to play chess. Or maybe it is if you ask the right questions, but we will do better. Every piece on the board has its square, noted by the file (from A to H) and rank (from 1 to 8). So you can make a move by stating two squares, one defining the piece to move and one to give the destination. The standard is even more fancy, where you define the piece and move together. The standard way however is a bit more complex to encode back to actions.

A side note: The standard notation would probably be better for the LLM, since that is how it is used to see chess in the text form.

The Board

First, we need a way to present the state of the board as text. How? Just write the square coordinate and the corresponding piece for both, black and white pieces. This is how the starting position looks:

White pieces:
rook at a1
knight at b1
bishop at c1
king at d1
queen at e1
bishop at f1
knight at g1
rook at h1
pawn at a2
pawn at b2
pawn at c2
pawn at d2
pawn at e2
pawn at f2
pawn at g2
pawn at h2
Black pieces:
rook at a8
knight at b8
bishop at c8
king at d8
queen at e8
bishop at f8
knight at g8
rook at h8
pawn at a7
pawn at b7
pawn at c7
pawn at d7
pawn at e7
pawn at f7
pawn at g7
pawn at h7

The Prompt

Then we need to define a prompt that gives the desired output (at least most of the time). The base prompt is:

Given the board below, what is the best move for player {player}?\n
Give answer like this "||e3->e4||"

The board is then embedded at the end of the prompt. This worked sometimes, but It didn’t really go too far. First I needed to stop it from using the example move:

Given the board below, what is the best move for player {player}?\n
Give answer like this "||e3->e4||" (just an example, make your own move)

But then it started to give moves that are illegal. For this, I added feedback whenever it tried something illegal:

You tried {piece}->{move} but that is illegal. Try another piece!

This worked already quite good, but the moves did not make much sense and it had to try many times. This last added line helped a lot:

Your oppenent moved from {from_input} to {to_input}

For the final touch, we added a separate “commentator” feature, where the LLM was given the board state and the last move and asked to give a comment about the move.

The Game

Let’s go straight to the action. Here is the first game where “AI” is playing against another “AI”:

Gotta love the way white jumped into action with the king, to die. Also the commentator's comment

And the king slides over to e2, perhaps seeking a safer haven from potential checks and attacks. It’s a small but strategic move that shows White is being cautious and mindful

shows that there is a lot to learn for us as humans from AI Chess.

Aftermath

On a more serious note, the last move should have been illegal, but the game engine wasn’t perfect (it doesn’t know that the king is not allowed to commit a suicide). It still highlights the problem of using LLMs for too specific and deterministic tasks as gaming. Even the least skilled human chess player could see that is not the move you wanna make, be it illegal or not.

When LLMs and AGI reach the level where the task doesn’t matter and it outperforms all the more traditional machine learning models, the need for task-specific customization remains. Generating text and images is a valuable tool for us, but it doesn't take over the world by itself. I think.

Code base:

GitHub - Miikkasna/chessai

Contribute to Miikkasna/chessai development by creating an account on GitHub.

github.com

Base for the chess game credits for:

https://www.geeksforgeeks.org/create-a-chess-game-in-python/