M2M Day 341: I made a big mistake, but it’s a good big mistake

6 min readOct 8, 2017

This post is part of Month to Master, a 12-month accelerated learning project. For October, my goal is to defeat world champion Magnus Carlsen at a game of chess.

Last night, I climbed into bed, ready to go to sleep. As soon as I closed my eyes, I realized that I made a major mistake in my calculations from yesterday.

Well, technically I didn’t make a mistake. The chessboard evaluation algorithm I described yesterday does require 3,168,255 math operations and 1,589,248 memory operations. However, I failed to recognize that most of these operations are irrelevant.

Let me explain…

If you haven’t yet read yesterday’s post, it would be very helpful for you to read that first. Read it here.

Addressing my mistake

Yesterday, I introduced bitboard representation, which is a way to completely describe the state of a chessboard via 773 1’s and 0’s. Bitboard representation is the input vector to my evaluation function.

When calculating the number of math operations this evaluation function would require, I overlooked the fact that only a maximum of 37 digits out of the 773 in bitboard representation can be 1’s, and the rest are 0’s.

This is important for two reason:

The function, f¹(x), that converts the input bitboard representation x into the hidden vector h, doesn’t actually require any multiplication operations, since either I’m multiplying by 0 (in which case, the term can be completely ignored), or I’m multiplying by 1 (in which case, the original term can be used unchanged). In this way, all 1,583,104 multiplication operations that I estimated yesterday are no longer necessary.
f¹(x) only requires addition operations between all the 1’s in the bitboard representation, which we’ve now correctly capped at 37. Therefore, instead of 1,581,056 addition operation as estimated yesterday, the algorithm only requires 36 x 2048 = 73,728 operations.

f² still requires the same 4095 operations, which means that, in total, the evaluation algorithm only requires 77,823 math operations.

This is a 40x improvement from yesterday.

Additionally, the memory operations scale down in the same way: For f¹, I only need (37 x 2048) + 2048 =77,824 read operations, and 2048 write operations, totaling to 79,872 memory operations.

Therefore, in total, the evaluation algorithm described yesterday actually only requires 157,695 total operations.

Computing the new time requirements

If I again assume that I can perform one mathematical operation in 3 seconds and one memory operation in 1 second, I would be able to evaluate one chess position in 313,341 seconds or 3.6 days, which is down considerably from yesterday’s four months.

Still, if we assume that I would need to execute 500 evaluations per game, I would need 3.6 days x 500 evaluations = 5 years to play one game.

5 years is still too long, but at least now I can complete one game of (computationally beautiful) chess in my lifetime.

But, I think I can do better…

Using “Threshold Search”

Yesterday, I assumed that I would need to evaluate 10–15 options per move in order to find the optimal option. In other words, I would perform 15 evaluations, determine which move led to the greatest evaluation score, and then play that move in the game.

However, what if it’s unnecessary to find the absolute best move? What if it’s only necessary to find a move with an evaluation above a certain threshold?

Then, for a given move, I can stop evaluating once I find the first board position that surpasses this threshold. Theoretically, I could pick this board position on my first try (although, I wouldn’t be able to do this consistently. Otherwise, I would just be the world’s greatest chess player).

Though, I estimate I could find a threshold-passing move in 2.5 evaluations on average.

Therefore, if I’m able to implement this thresholding into my evaluation algorithm, I can reduce the number of evaluations from 500 to 100.

But, I can still do better…

Playing the opening

On Chess.com, I’m able to run my games through a real chess computer that analyzes each of my moves. The computer categorizes the moves into “Excellent” (for when I play the absolute best move), “Good” (for when I play a move above the threshold, but not the best move), and “Inaccuracy”, “Mistake”, and “Blunder” (for when I screw up).

You’ll notice that 8 out of my 49 moves were below the threshold (I was playing White).

Interestingly though, going through all of my games, I seem to play moves exclusively in the Excellent and Good categories until about move 12-15, at which point, the game becomes too complex and I’m in trouble.

I’ve also found that towards the end of the game, I make very few below-threshold moves.

Based on this data, let’s say that I only need to perform my mental evaluations for 24 out of the 40 moves found in an average game.

Thus, in one game, I’d only require 60 total evaluations, and at 3.6 days per evaluation, I could complete a full game in 7 months.

This is getting closer, but I can still do better…

Finding reductions via “Update Operations”

Right now, every time I’m doing an evaluation, I’m starting the computation over from scratch, which isn’t actually necessary, and far from most efficient.

Particularly, the hidden layer h, the output of f¹ (which is the most computationally costly step of the algorithm), is nearly identical to the previous h computed in the previous evaluation.

In fact, if the new evaluation is the first for a particular move, then h only differs by the move the opposing color made, and the move under consideration. (Keep in mind, because I’m using the thresholding approach, the move corresponding to the last evaluation I computed, and thus, the last h I computed, was actually executed in the game, so I have an accurate starting point in my working memory.)

Thus, in this case, each of the 2048 values of h need to be updated with four operations: One corresponding to the move made by the opposition, one to cancel out where the opposition’s piece was prior to the move, one corresponding to the move under consideration, and one to cancel out where this piece was prior to consideration.

Thus, to update h, only 4 x 2048 = 8192 math operations (additions) are necessary.

If I prepare a certain opening as White, and Magnus responds with the theoretically determined moves, I can come into the game having pre-memorized the first h vector I’ll need, thus only ever having to perform updates.

Of course, if the game veers off script, I’ll need to execute a full evaluation for the first move, but never after that.

Thus, in total, since 4095 operations are still required to convert h to the output y, 8192 + 4095 = 12,287 math operations are needed for an update evaluation.

The memory capacity required also scales down with this update approach, so that only 8192 x 2 + 4096 = 20,480 memory operations are needed per evaluation.

If the evaluation is the second, third, etc. for a particular move, then h only differs by the move previously under consideration and the move currently under consideration. Thus, again, each of the 2048 values of h need to be updated with four operations.

So, this kind of update also requires 12,287 math operations and 20,480 memory operations.

Again, assuming that I can perform one mathematical operation in 3 seconds and one memory operation in 1 second, using this updating approach, I would be able to evaluate one chess position in 57, 341 seconds or 16 hours.

This means that I could complete one full game of chess in 16 hours x 60 evaluations = 40 days.

Clearly, more optimizations are still required, but things are looking better.

In fact, we are starting to get into the range where I could do a David Blaine-style stunt, where I spend 40 days in a jail cell computing chess moves for a one-game extended exhibition.

I think I’d prefer to continue optimizing my algorithm instead, but I’d likely have a price if a very generous sponsor came along. After all, it would be a pretty fascinating human experiment.

Still, tomorrow, I’ll continue to look for ways to improve my computation time.

Hopefully, in the next two days, I can finish all the theoretical planning, have a clear plan in mind, and begin the computer work necessary to generate the actual algorithm.

Read the next post. Read the previous post.

M2M Day 341: I made a big mistake, but it’s a good big mistake

Addressing my mistake

Computing the new time requirements

Using “Threshold Search”

Playing the opening

Finding reductions via “Update Operations”

Max Deutsch is an obsessive learner, product builder, and guinea pig for Month to Master.

If you want to follow along with Max’s year-long accelerated learning project, make sure to follow this Medium account.

Written by Max Deutsch