Analyzing the 2016 World Chess Championship
It’s no secret that the Pachyderm team loves chess! In fact, one of our first demonstrations of Pachyderm was a statistical analysis of chess blunders. Now, with the 2016 world chess championship decided, we have updated our analysis and see what we could learn about the matches between Magnus Carlsen and Sergey Karjakin.
In the following analyses, which were implemented in a Pachyderm data pipeline, we attempted to learn:
- For each game in the Championship, what were the crucial moments that turned the tide for one player or the other, and
- Did the players noticeably fatigue throughout the Championship as evidenced by blunders?
We chose to run these analyses in Pachyderm, because they necessitated the use a variety of technologies (Pachyderm is language agnostic), and it allowed us to quickly and easily commit new game data, which triggered automatic updates to our analyses. Read more about Pachyderm pipelines on the Pachyderm website and visit our pipeline docs to learn how to run your own analyses in Pachyderm.
(If you don’t care about the technical details of the analyses and just want to see pretty pictures, skip to section entitled “The Results”)
To gain some insight into crucial game moments and fatigue, we are going to, for each game of the championship, use the Stockfish chess engine to annotate the game. These annotations will include played move scores and “best move” scores (i.e., a score of the best move as determined by the chess engine) in units of centipawns. We are then going to visualize and further analyze the annotations to extract our insights.
All in all, our Pachyderm pipeline includes the following stages and utilizes the pictured technologies:
As you can see, we have a variety of languages and frameworks working together in a Pachyderm pipeline to analyze these chess games.
There are Go programs, python programs, obscure chess engines, etc. Pachyderm allows us to put all of these things together to accomplish a common goal.
Stage 1: Annotating each game
To programmatically annotate each chess game, we create a pipeline stage that takes game files (in “PGN” format from chessgames.com) as input and outputs an annotations file for each game. This stage executes a Go program that parses the game files and utilizes the Stockfish chess engine to generate annotations of the games. The annotations can be found here, and include a variety of interesting data about each move of a game. However, for this analysis, we are particularly interested in the:
- The “played move score” — a score (measured in centipawns) of the move actually played by the respective player, and
- The “best move score” — a score (also measured in centipawns) of the move determined to be “best” by the chess engine.
By examining the difference between these two numbers, we can determine the “goodness” or “badness” of a move. That is, whether the move played by the player was the determined best or some number of centipawns worse than the “best” move.
Stage 2: Analyzing each game
In our analysis stage, we execute a python program that parses the JSON game annotations for each game and calculates the difference between the played move score and best move score for each move of the game. These score differences are output as CSV files for each game and player in the analyze data repo, and, if you like, you can examine them here.
In addition to saving the raw score differences, we use python pandas and matplotlib to generate a plot for each game. These plots include a visualization of each player’s score differences (played move score minus best move score) for each move of the game and a visualization of the chess engine’s indication of who is winning the game at each move. The plots for each game are also saved to the analyze data repo as PNG files.
Stage 3: Aggregating the results
Finally in an aggregation stage, we execute another python program that reads the CSV files for all games and counts the number of 0.5 to 1 and 1+ pawn blunders for each player in each game. These counts then visualized in a bar chart that shows the number of blunders for each player throughout the championship.
The idea with the bar chart is that, if players are fatiguing throughout the championship, we might see a trend upward in blunders towards the later games. In any event it will be interesting to see where the worst moves were played throughout the championship.
As the results from the world chess championship were rolling in, we were committing the new game data to a Pachyderm data repository, which triggered the above analyses to update automatically as the new data was committed. It’s was tons of fun to watch the results take form, and we hope that you also enjoy some of the insights below!
Per Move/Game Insights
As mentioned, our Pachyderm pipeline generates a visualization for each game. The visualization shows each player’s score differences (played move score minus best move score) for each move of the game and the chess engine’s indication of who is winning the game at each move.
For example, for game 1 of the Championship, this visualization looks like this:
Notice the following things:
- The chess engine suggested that Magnus Carlsen might have had a slight advantage a few times during the game (a possibly negligible advantage), but the game eventually ended in a draw.
- The score differences (played move score minus best move score) for each player hovered around zero throughout the game. This means that, at least according to the chess engine, each player was playing at or near the “best” move throughout the game. Note, some deviation above zero are expected and are caused by noise in the chess engine score calculations.
Ok, maybe looking at a draw isn’t super exciting, so let’s look at the crucial game of the championship. Let’s take a look at game 8, which ended in a victory for the challenger Sergey Karjakin:
Just by visualizing the game in this way we can see the following:
- Carlsen made a small mistake at white move 31 (less than a 1 pawn mistake) that started tipping the balance in the favor of Karjakin.
- Shortly after, Carlsen makes a 1.5+ pawn blunder at white move 35, which surely can’t be good for his changes.
- Then something interesting happens at black moves 35 and 37. Karjakin makes a series of smaller mistakes that momentary give Carlsen a better chance at regaining an advantage.
- In the end, however, Carlsen does not capitalize on this and eventually dips to defeat.
Despite this performance in game 8, Carlsen did end up winning game 10 to tie the championship, and he eventually won the rapid tie break to take the championship. Carlsen’s win in game 10 is illustrated in the visualization below:
If you want to explore the other games with this sort of visualization, you can find the rest of the plots here (including game 16 in which Carlsen made what Slate called one of the most beautiful, stunning moves in the history of the World Chess Championship).
Our second question is more of a championship-wide question. Can we learn or detect anything related to fatigue of the players throughout the championship. For example, one might expect the players to get tired and make more blunders near the end of the Championship. Is this true?
As mentioned in “The Analysis” section, we counted any 0.5 to 1 and 1+ pawn mistakes per player, per game and created a bar chart illustrating these aggregations:
First, notice that, in games 3 and 4, Karjakin made noticeably more 0.5–1 pawn blunders than Carlsen. Carlsen was unable to exploit these mistakes for victories, because he made a 1+ pawn blunder in both of those games. In fact, if you look at the per move plots for game 3 and game 4, it is clear that Carlsen was leading throughout, but he made a blunder near the end of each of those games (ultimately leading to tie).
Game 8 is the game that Carlsen lost, and we can see that, as expected, Carlsen’s “large” blunder count (i.e., 1+ pawn blunder count) is higher for that game. However, it is interesting to see that for the “classical” games played during the championship (the last 4 games being rapid tie break games), Carlsen’s total 1+ pawn blunder count was actually greater than Karjakin’s total 1+ pawn blunder count. Whereas, when the games flipped to rapid tie break games, Karjakin’s 1+ pawn blunder count was consistently higher than Carlsen’s count.
Thus, although there might not be a clear trend in the blunders over time (i.e., indicating fatigue), there is a one clear trend: Karjakin appears to make fewer 1+ pawn blunders than Carlsen in classical games, and vise versa for rapid games. In other words,
Karjakin had the better classical game performance, whereas Carlsen had the better rapid performance (as measured by 1+ pawn blunders).
As a follow up to this analysis of the 2016 championship, it would be natural to look at Carlsen and Karjakin’s blunder counts in other classical/rapid games to confirm this trend. Stay tuned for another follow up post.
All of the code, pipeline specs, input data, and output data for the above analysis can be found in our chess repo. To recreate the analysis, deploy pachyderm in just a few commands and then check out our pipeline docs to learn how to run the pipelines to analyze your own chess games!