What is the main Purpose of this?
Basically, curiosity. Lichess database has more than 775 million games (updated to August 2019). What bot will we create if we use all this data? Which rank can the bot get?
Where do we start?
First of all, in Lichess database site you can download all the games that played, month by month since January 2013.
The files are PGN formats, which is how the computer reads and write chess games, and this specific ones looks like this:
From the PGN files, we only need the moves, nothing else. It will be easy to extract because of the simple format. from the “moves” line we will clean the check/mate, eaten & final score symbols. we will finally have a line of “pure” moves:
How do we store it?
If we do it with Object-Oriented Programming it will be easier for us to track and store the data, BUT it will cost us a lot of space and much more time. So, we will use it with Hash-Tables and strings to make it as simple and efficient as we can.
Step 1:
We create a dictionary, let’s name him A. A’s key is the current chess board position but converted to string, here is an illustration of the conversion:
The value of A is another dictionary, let’s call him B. B’s key is the current move (like “e4” or Qb3") and the value is a number that represents the amount of how many players did that move.
Step 2:
Ok, we finished processing all the games and created a massive table-in-table database. Our current goal is to make it easier for the bot to access it while his playing.
Now we create a new dictionary (you get it, we will call him C), it’s key is A’s key and the value is B’s key that represents the most played move from all of Bs’ values.
Eventually our C dictionary is this:
Now, while the bot is playing he can access C dictionary and and play like the majority of us play!
This process takes time and space, because we are talking about million of games it will be recommended to run it on computers with 16GB+ RAM and have at least 50GB of free space. Another recommendation, what I did, is running it on AWS-EC2 server.
To summarise:
I played against the bot and he won. he is a good player but not the best, he still has new moves that he (or.. we) never played before so he just get stuck.
So far, my bot “played” only one year of playing, due to space and memory limitation.
This is the GitHub of my project. If you have more efficient way to make it, or any other ideas — please let me know!