ChatGPT is a new AI developed by OpenAI that has recently gained attention and hype. ChatGPT is a large language model. But just how good is ChatGPT at playing chess? This article will explore its capabilities and evaluate its performance by emulating a chess training session with ChatGPT.
Chess players of any level are welcome — just little knowledge about the rules is required.
Spoiler alert: for a language model, it’s surprisingly scary.
Session 1. Opening. Scholar’s mate
Our first session will be dedicated to Sholar’s mate. This is one of the easiest mates in chess that is usually taught when you start learning chess.
Ok. So far, so good. Although it’s weird ChatGPT says 3… d6 is the only move. It’s easily noticed that 3… Qe7, 3… Qf6, 3… d5, 3… Nh6, etc, also work. Let’s continue our session by asking about other defending moves.
So there are a couple of points I want to stop on. Of course, it’s incorrect that 3… Qe7 doesn’t prevent checkmate.
But notice how it switched from (3…) black move to (4.) white move. Stating 3… g6 is the strongest move is correct and probably taken somewhere from the chess literature.
Quite impressive that the language model (not chess engine) can figure out that 4… Qxf7 leads to the white queen being captured. And quite impressive ChatGPT is following with a legal bishop capture of the queen (still not a direct or leading to checkmate, though).
Aha! So it seems the “chat” abilities — understanding previous text context is working ok, and the explanation makes sense, but it is the 5th move, not the 4th. The rest of the text makes sense; surprisingly, multiple plagiarism checkers show it’s unique.
We’ll continue with “analyzing” the obtained position.
Ok, ChatGPT seems to drop the ball (chess piece?) while discussing positions obtained from one chat intent to the next. A lot of generalistic text is still quite correct. Quite interesting that 2/3 moves are also legal. Can we figure out why the model believes Bg5 is a valid move?
The deeper into the game, the less sense the answers make. Though it correctly states it is the 6th black move to be made (6… h6), it’s wrong to talk about pins. In fact, the phrase “to move without being captured” is quite “unchessy”. The FIDE referencing is also a bit misleading, as I was asking regarding the validity of the Bg5 move if the pawn (e2) is in its initial position, not the position being initial.
The last question to ask regarding the position is related to its evaluation. No surprise ChatGPT “evaluation” is entirely wrong, as it’s just a language model. Despite some mistakes, it’s pretty impressive how can a language model perform at a chess opening.
Session 2. Middlegame
How about we ask a language model if it can generate a chess game? It can generate text, code, etc. Why not a random game of chess?
Quite impressive. But after checking for validity, 12. Rxe7 isn’t valid. Maybe it is just a typo?
Spooky! Honestly, I had to blink twice before I started believing the response I got.
It’s the middlegame now. Let’s ask some questions regarding the position we gained.
It’s pretty hard to find a correlation between the position. Maybe ChatGPT has no information on what a pin is?
We broke ChatGPT.
But even if it gave some response, becoming quite obvious that it has been using general knowledge without relating it to a specific position.
I caught myself at one point forgetting that ChatGPT is not a chess engine…
Session 3. Endgame
The last session will be dedicated to a very simple endgame.
Looks like the ChatGPT wasn’t able to decode the end position we’ve described. Let’s make it more straightforward.
The position is quite simple. No moves are valid if it’s white to move — it’s a stalemate. If it’s black turn to move — simple mate-in-one can be achieved in 4 different ways.
It would be quite naive to expect to pick up such knowledge from a language model. But it is still quite impressive that the model suggests a 1-cell move for the white king (although illegal) and a valid move for the black queen.
Although ChatGPT is a powerful language model, it cannot play chess. This is because ChatGPT is a text-based model and does not have the ability to understand or interpret visual information like a board game. Additionally, playing chess requires a high level of strategic thinking and decision-making, which goes beyond the scope of ChatGPT’s capabilities.
While ChatGPT may be able to generate text that describes chess moves or strategies, it is not equipped to actually play the game.
Honestly, I had much fewer expectations when I started this experiment. Throughout the whole process, I had great chat/conversation experience with ChatGPT dropping the narrative thread only once or twice.
From an ML perspective, the result is extremely solid. Chess-wise, the results are still better than expected.
But this is the catch. The uncanny valley can also exist for machine learning models. And when we’ll consider using language models for knowledge generation, one might not realize the errors the model makes, as the majority of generated content seems correct.