M2M Day 342: Is this even humanly possible?

4 min readOct 8, 2017

This post is part of Month to Master, a 12-month accelerated learning project. For October, my goal is to defeat world champion Magnus Carlsen at a game of chess.

Yesterday, through some further optimizations, I was able to decrease the run time of my chess evaluation algorithm to 16 hours per evaluation, thus requiring about 40 days for an entire game.

While I’m excited about where this is headed, there’s a problem: Even if this algorithm can be executed in a reasonable time, learning it would be nearly impossible, requiring that I pre-memorize about 1.6 million algorithm parameters (i.e. random numbers) before I even complete my first evaluation.

That’s just too many parameters to try to memorize.

The good news is that it’s probably not necessary to have so many parameters. In fact, the self-driving car algorithm that I built during May, which was based on NVIDIA’s Autopilot model, only required 250,000 parameters.

With this in mind, I suspect that a good chess algorithm only requires about 20,000-25,000 parameters — an order of magnitude less than the self-driving car, given that a self-driving car needs to make sense of much less structured and much less contained data.

Assuming this is sufficient, I’m completely prepared to pre-memorize 20,000 parameters.

To put this in perspective, in October 2015, Suresh Kumar Sharma memorized and recited 70,030 digits of the number pi.

Thus, memorizing 20,000 random numbers is fully in the possible human range. (I will be truncating the parameters, which span from values of 0 to 1, at two digits, which should maintain the integrity of the function, while also capping the number of digits I need to memorize to 40,000).

Reducing the required number of parameters

In my algorithm from yesterday, most of the parameters came from the multiplication of the 773 digits of bitboard representation with the 2048 rows of the hidden vector h.

Thus, in order to reduce the number of necessary parameters, I can either condense bitboard representation, or choose a smaller width for the hidden layer.

At first, my instinct was to condense bitboard representation down to 37 digits, where the positions in the vector corresponded to particular chess pieces and the numbers at each of these spots corresponded to the particular squares on the board. For example, the first spot in the vector could correspond to the White king, and the value in this spot could span from 1 to 64.

I think an idea like this is worth experimenting with (in the future), but my instinct is that this particular representation is creating too many dependencies/correlations between variables, resulting in a much less accurate evaluation function.

Thus, my best option is to reduce the width of the hidden layer.

I’ve been using 2048 as the width of the hidden layer, which I realized from the beginning is quite large, but I tried to carry it through for as long as possible as a way to force myself to find other large optimizations.

I hypothesize I can create a chess algorithm that is good enough with two hidden layers of width 16, which would require that I memorize 12,640 parameters.

Of course, I need to practically validate this hypothesis by creating and running the algorithm, but, for now, let’s assume this algorithm will work. Even if it works, it may not be time effective, in which case it’s not worth actually building.

So, let’s validate this first…

The new hypothesized algorithm

Unlike the algorithm from the past two days, my hypothesized algorithm has not one, but two hidden layers, which should provide our model with one extra level of abstraction to work with.

This algorithm takes in the 773 digits of the bitboard representation, converts these 773 digits into 16 digits (tuned by 12,368 parameters), converts these 16 digits into another set of 16 digits (tuned by 256 parameters), and outputs an evaluation (tuned by 16 parameters).

This algorithm would require (36 x 16) + (16 x 16) + (15 x 16)+ 16 + 15 = 1103 math operations, (36 x 16) + (16 x 16) + 16 = 848 memory read operations, and 16 + 16 = 32 memory write operations.

Thus, still assuming a 3-second execution time per math operation and a 1-second execution time per memory operation, one evaluation would require (3 x 1,103) + 880 = 4,189 seconds = 1.2 hours to execute.

Of course, as explained yesterday, most evaluations would only be updates on previous evaluations. Using this new algorithm, an update evaluation would require (4 x 16) + (16 x 16) + (15 x 16) + (16 + 15) = 591 math operations, 16 + (4 x 16) + 16 + (16 x 16) + 16 = 368 memory read operations, and 16 + 16 = 32 memory write operations.

Therefore, an update evaluation would require (3 x 591) + 400= 2,173 seconds = 36 minutes to execute.

So, a full game can be played in 36 minutes x 60 evaluation = 36 hours = 1.5 days.

This is still a little long for complete use during a standard chess match, but I could imagine a chess grandmaster using one or two evaluations of this type during a game to calculate a particularly challenging position.

This new algorithm has an execution time of 1.5 days per game, which is almost acceptable, and requires memorizing 12,640 parameters, which is very much in the human range.

In other words, with a few more optimizations, this algorithm is actually playable, assuming that the structure has enough built-in connections to properly evaluation chess positions.

If it does, I might have a chance. If not, I’m likely in trouble.

Time to start testing this assumption…

Read the next post. Read the previous post.

M2M Day 342: Is this even humanly possible?

Reducing the required number of parameters

The new hypothesized algorithm

Max Deutsch is an obsessive learner, product builder, and guinea pig for Month to Master.

If you want to follow along with Max’s year-long accelerated learning project, make sure to follow this Medium account.

Written by Max Deutsch