Percentage of types piece A captured piece B (only those that happened in at least 5% of games)

How To Analyse Chess Games Using Graph Networks

Using Neo4j To Uncover Clash Patterns Between Pieces

Daniel Sharp
Applied Data Science
6 min readApr 26, 2020

--

I’ve recently got back into playing chess thanks to the Coronavirus lockdown and have been playing more than anything on Lichess. In an attempt to better understand the dynamics of the game I thought it could be interesting to explore the data. I found that Lichess publish all of their games on their site and at the time of writing, had 1,120,599,152 games!

I decided to analyse how captures happen in the game. To do this, I downloaded a sample of the Lichess data (back from 2013 because it was 16 MB compared to 10 GB for the most recent one) and calculated the percentage of times a given piece had captured another piece over 20,000 matches. I nice way to visualise this data is the following:

Percentage of captures by white pieces on black pieces. On the right it is the black pieces capturing white pieces. Data Source: Lichess Database

The overall pattern is the same for both sides, with the most common capture happening between the Queens. The Queen is also the top captor across the board as a result of its high mobility. We can also see that the second most common capture is the white e2 pawn on the black d7 pawn, which we would also expect as it’s a fairly common opening move. In conclusion, chess seems to be a very symmetrical game.

To add another dimension to this analysis, I decided to use Neo4j to map these relationships (the captures) between the pieces and run some of their algorithms to explore the data further.

A piece’s importance

To measure importance, in this case, would be to find the piece that captures a wider range of pieces in a higher percentage of games. From the heat-map above we could fairly quickly discover that the Queen is the most influential piece of the game. However, it would be difficult to quantify is importance as well as to compare it with other pieces. Degree centrality, which is in basic terms a count of the number of relationships a node has, turned out to be useful for this. In this case, since the relationship is weighted to the percentage of games in which piece A captured piece B, I used weighted degree centrality, which uses the sum of the weights.

Weighted Degree Centrality — Node size is scaled by the value, filtered edges for those that happen in at least 5% of games. Data Source: Lichess Database

No surprises here, with both Queens topping the importance table, and it’s also clear from the graph visualisation as they have arrows pointing almost to every other piece. On the opposite side of the chart there’s the Pawns in the h, a, g and b files. This again makes sense, since these are the pieces on both side extremes of the board, and thus don’t get to see a lot of action.

If you’ve played chess before you probably know that an unofficial value has been assigned to each piece type as a way to help decision making in games. This rule states that a Rook is worth 5 points, while Bishops and Knights are worth 3 points each. It seems counterintuitive then, that both Knights and Bishops are ranked higher than the Rook. Well, Rooks seem to capture other Rooks fairly often, but otherwise, they don’t seem to capture many other pieces. Their assigned value is most probably due to their usefulness at setting traps and getting mates.

You have probably noticed the King doesn’t show up in the network. This is because he’s not able to be captured and is also not used as an attacking piece. The point of the game is to keep him safe while you attack your opponent’s King.

Finding Communities

Looking at the graph and the heat map, there are clearly several pieces that capture each other in games fairly regularly. One example of this are the Rooks, which basically just have arrows pointing among them. I used Neo4j’s implementation of the Louvain algorithm to identify these communities. This algorithm measures how the density in the links of a group of nodes compare the density of a random network.

The nodes are coloured according to the community they were assigned to. Edges filtered for captures in at least 4% of games. Data Source: Lichess Database

The communities found are the following:

The Rooks — Community 0 (light blue) and 14 (dark purple)

These consist of the Rooks on opposing sides of the same files, which as we’ve said is expected, since they capture each other in around 10% of games.

The Queens — Community 18 (red)

Then we have the Queens, which are are usually exchanged in ~20% of games. They also show a high number of edges with other nodes, however, the percentage of games where these happen isn’t very significant.

Right side Knights — Community 30 (orange)

The fourth group belongs to the Knights on the right hand side of the board. It’s surprising that there isn’t a community for the other pair of Knights.This second pair of Knights has been grouped up with the centre Pawns, which we will explore in their own group.

Queenside Pawns — Community 10

This group is interesting, since it contains only the pawns on the queenside of both players. A quick Google search took me to this article analysing the evolution of chess, which has the following plot in it.

Credits to Randy Olson. Image source here

This helps explain why the queenside pawns were placed in the same group. Since most players castle on the kingside, those pawns will not be moving a lot in the game as they’re busy protecting the king. However, pawns on the other side of the board are free to advance, which translates into a higher number of clashes between them.

King plus Bishops — Community 3 (yellow) and 27 (green)

The pieces. Made with Lichess board editor

These two groups contain the same pieces on opposite sides of the board. These are the king and the bishop to its side while the other player has the bishop on the same colour square. This is expected since bishops of different colour squares would never be in the same community, as it’s impossible for them to capture each other.

Left hand Knights and Pawns — Community (21)

The pieces. Made with Lichess board editor

My guess is that this community is a consequence of common openings. One of the most popular first moves for white is to push the e2 pawn to the e4 square, which is then protected by the knight moving to c3. The same steps follow for the black knight and the d7 pawn.

The moved described above

Next steps

Although with this analysis I probably won’t be revolutionising the game of chess anytime soon, it’s been interesting to explore this new take on chess and use graph networks to visualise and uncover key aspects of the game. As always, I’ve uploaded the code to my Github here. Some interesting additions to this analysis could be exploring whether these trends vary according to the players’ Elo rating. Maybe more experienced players will stray away from the ‘traditional’ openings, which could mean this analysis produces completely different results.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website. Follow us on LinkedIn for more AI and data science stories!

--

--