M2M Day 379: Digging a little deeper
This post is part of Month to Master, a 12-month accelerated learning project. For October, my goal is to defeat world champion Magnus Carlsen at a game of chess.
Yesterday, I finally was able to test my chess algorithm on a recent game I played, and it worked quite well. You can watch the 10-minute video demonstration here.
Today, I dug a little bit deeper into the performance of the algorithm, and the results were still good, although not perfect.
For the first 25 moves or so of any chess game, the algorithm performs more or less perfectly. It identifies good moves as good and bad moves as bad — comfortable carrying its user through the chess game’s opening and some of the middle game.
The algorithm performs less well in the late middle game and end game. In particular, during this part of the game, the algorithm’s threshold for good moves is too low: It recognizes too many inaccurate moves as good.
The algorithm does find the best line in the end games (consistently calculating these moves as good), but there is too much surrounding noise for me, as the user of the algorithm, to find this best line.
I’m not particular surprised by this outcome: This iteration of the model was only trained on 1,500 games and about 50,000 chess positions (I used this reduced dataset so that version 1 of my model could at least finish training before the challenge ended).
The problem with such a small dataset is that it likely doesn’t have enough duplicates of later-stage chess positions to produce accurate labels for these positions.
I just took a quick look at the dataset, and there are many later-stage chess positions that only appear once in the entire dataset. In fact, most of the later-stage chess positions only appear once, which distorts this part of the dataset.
On the other hand, the earlier chess positions are seen enough times that the true natures of these positions were correctly revealed when the dataset was created (Hence the nearly perfect results during openings and early middle games).
This problem can likely be remedied though: I just need to process many more games to create a fully undistorted dataset.
Of course, training a model on this dataset may take much longer, but the result, theoretically, should be significantly better for performance on all parts of the chess game.
Thus, today, I started training the same model, but this time on a dataset of 100,000 games. I’m also processing more games, hoping to build a dataset of around ten million games.
Based on what I’ve seen so far, I suspect these models, with the input of much more data, will be nearly perfect in their performance. After all, the current model is incredibly accurate, and it’s only basing its performance on 1,500 games.
If anything, yesterday’s result proved to me that algorithmic chess is a legitimate and functional approach (that already works reasonably well).
What is still unclear is whether or not “perfect performance” at identifying good moves and bad moves leads to Magnus-level gameplay. This is still to be determined…
Read the next post. Read the previous post.