When Grandmasters Blunder

A statistical analysis of chess.

Magnus Carlsen and Viswanathan Anand playing in The World Chess Championship

Sochi, Russia — Magnus Carlsen was 26 moves into game 6 of his title defense against Viswanathan Anand when he experienced the worst feeling in chess. The feeling that comes with the realization that you’ve left one of your pieces out to dry and there’s nothing left to do, but pray. Blunders like this are all too common when I play chess, but they’re incredibly rare at this level. Anand and Carlsen are some of the greatest to play the game, they (almost) never do things like this. What followed was even more incredible. Despite his blunder, Carlsen went on to win game 6 (and the series) thanks to Anand responding immediately with a blunder of his own. After the game Carlsen described it as “a comical exchange of blunders.”

Blunders at this level are rare, but just how lucky are we to have seen a turn of patzer play from this pair? In this post we’ll take an analytic approach to this question. We’ll start by developing a computational way to classify blunders. Then we’ll gather a year’s worth of chess games and store it in a distributed file system so that we can use a cluster of machines to analyze the games with a MapReduce engine. Full disclosure: I’m one of the founders of Pachyderm, the distributed file system and MapReduce engine that we’re going to be using. However, I’m not a data scientist, so please email me if there are any mistakes: jdoliner@pachyderm.io.

A classification of all the moves played in 2014. Created using Crafty and Pachyderm.

The first thing we need to settle is “what is a blunder?” A human will tell you that a blunder is a move which substantially decreases the player’s chances of winning. Good players can classify a move as a blunder with just a few seconds’ thought, but even that’s too slow for our purposes. Instead we’re going to be using a computer chess player or “chess engine” called Crafty.

Crafty is far from the strongest engine on the market right now, but it has one very appealing feature called “annotation mode.” This mode takes an already-completed game and highlights moves which Crafty believes to be suboptimal. It doesn’t just classify moves as blunders or non-blunders, it also quantifies how bad the blunder is in units of pawns.

Crafty computed that Carlsen’s move “26. Kd2" was 2.11 pawns worse than his best move “26. Rg3".

For example, applying crafty to the Carlsen-Anand game shows that players hurt their positions by approximately two pawns with their blunders. This might not seem like a lot, but in high-level chess, a two-pawn deficit is almost always a loss.

Now that we have a way to classify blunders, we’ll need to bundle Crafty up in a Docker image so we can use it in Pachyderm. The source for our image is available on GitHub or it can be pulled directly from the Docker registry. The image contains two http servers. A map server which takes chess games in .pgn format and returns the ratings of the players and a bucketed count of Crafty’s scores of the moves. And a reduce server which takes the results from the map server and aggregates them into buckets based on the player’s rating.

Our MapReduce job gives us a mapping from rating to a vector of blunders.

Next we’ll need to get a Pachyderm cluster up and running and filled with data. Using data from chessgames.com, we wrote a simple script to upload it to Pachyderm’s file system (pfs) and kick off the pipeline. The script and data are available in the repo along with more detailed instructions on how to reproduce the results yourself.

Extend the analysis by forking the repo.

Crunching all the games from 2014 took about six hours on Google Compute Engine. In total, Crafty analyzed 4,899,067 moves and found that a scant 67,175 (1.37%) were two-pawn blunders or worse. Limiting ourselves to players with ratings above 2500 (Grandmasters) that number falls to 1.07%. If we narrow it down to players above 2775, which both Carlsen and Anand were during the championships, it falls all the way to 0.96%. Assuming Anand and Carlsen’s blunders were independent events, what we saw was a 1 in 10,000 occurrence. In other words, 1 in every 10,000 pairs of moves exchanged by players at this level should result in a double blunder. Of course, The World Chess Championship consists of more than a single pair of moves. Assuming 12 games of about 50 moves each, we can expect to see 600 move pairs which means seeing an exchange like this in a WCC event is more like a 1 in 20 event. So what we saw wasn’t actually that incredible, merely unlikely.

Blunders become exponentially less likely as rating increases.

The data reveals a strong correlation between blunders and rating. As we’d expect, highly-rated players blunder much less frequently than their lower-rated counterparts. Playing around with the data in Excel, we found exponential functions to be the best fit. The trendline above indicates that gaining 600 rating points halves the number of blunders a player makes. Chess, it seems, is a game of diminishing returns.

There are lots of cool stories you could tell with this data. We limited ourselves to games from 2014. I’d be interested to see how blunder occurrence has changed over time. There are also a few obvious ways that our analysis could be improved. Due to cost limitations we had to limit crafty to 2 seconds of analysis time per move and we only looked at a fraction of chessgames.com’s total corpus. We may look into doing an updated version of this post with a bigger budget.

Fork the jobs here: https://github.com/pachyderm/chess

Install pfs here: https://github.com/pachyderm/pfs

Thank you to: Marc Hesse, Josh Katz, Trevor Blackwell, Slava Akhmechet, Daniel Gackle, Dalton Caldwell, Lenny Levin and Yuri Sagalov for reading drafts of this post.