M2M Day 356: Fully dissecting machine learning code, line by line

7 min readOct 23, 2017

This post is part of Month to Master, a 12-month accelerated learning project. For October, my goal is to defeat world champion Magnus Carlsen at a game of chess.

Two days ago, I found some code on Github that I should be able to modify to create my chess algorithm. Today, I will dissect the code line by line to ensure I fully understand it, preparing me to create the best plan for moving forward.

Again, the code that I found is designed to take in an input image (of size 28 x 28 pixels, as shown below) and output a prediction of the numerical digit handwritten in the image.

Here’s the code in its entirety:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport argparse
import sysfrom tensorflow.examples.tutorials.mnist import input_dataimport tensorflow as tfFLAGS = Nonedef main(_):
  # Import data
  mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)# Create the model
  x = tf.placeholder(tf.float32, [None, 784])
  W = tf.Variable(tf.zeros([784, 10]))
  b = tf.Variable(tf.zeros([10]))
  y = tf.matmul(x, W) + b# Define loss and optimizer
  y_ = tf.placeholder(tf.float32, [None, 10])# The raw formulation of cross-entropy,
  cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)sess = tf.InteractiveSession()
  tf.global_variables_initializer().run()
  # Train
  for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})# Test trained model
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--data_dir', type=str, default='/tmp/tensorflow/mnist/input_data',
                      help='Directory for storing input data')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Part 1: Importing the necessary libraries and helper functions

The first seven lines of code are used to important the necessarily libraries and helper functions required to build the machine learning model.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport argparse
import sysfrom tensorflow.examples.tutorials.mnist import input_dataimport tensorflow as tf

Most importantly, line 6 (“from tensorflow.examples.tutorials.mnist import input_data”) is used to import the dataset, and line 7 (“import tensorflow as tf”) is used to import the TensorFlow machine learning framework.

I can keep these seven lines as is, except for line 6, which will depend on the format of my chess data.

Part 2: Reading the dataset

The next four lines are used to read the dataset. In other words, these lines convert the data into a format that can be used within the rest of the program.

FLAGS = Nonedef main(_):
  # Import data
  mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

This is the part of the code that is most intimidating to me, but I have a few ideas.

Firstly, based on the documentation that accompanies the Github code, it seems that the data is prepared in a fairly straightforward way, contained within two matrices:

The first matrix has dimensions of 784 by 55,000, where 784 represents the number of pixel in the image and 55,000 represents the number of images in the dataset.
The second matrix has dimensions of 10 by 55,000, where 10 represents the labels (i.e. digit names) for each image, and again 55,000 represents the number of images.

I should be able to prepare two similar matrices for my chess data, even if I don’t do it in a particularly fancy way. In fact, I might construct these matrices in a Python format and then just copy and paste them into the same file as the rest of the code, so I don’t have to worry about actually reading the data into the program, etc.

This sounds a little hacky, but should do the trick.

In fact, to confirm the shape of the data, I modified the program, asking it simply to print out the the array representing the first image:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport argparse
import sysfrom tensorflow.examples.tutorials.mnist import input_dataimport tensorflow as tfFLAGS = Nonedef main(_):
  # Import data
  mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
  
  print(mnist.test.images[0])
  
if __name__ == '__main__':
 parser = argparse.ArgumentParser()
 parser.add_argument('--data_dir', type=str, default='/tmp/tensorflow/mnist/input_data',
       help='Directory for storing input data')
 FLAGS, unparsed = parser.parse_known_args()
 tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

And here’s what it printed:

It looks like each of these image arrays are then nested inside of another, larger array. In particular, here’s the output if I ask the program to print all of the image arrays:

In other words, I need to prepare my data in a set of 773-digit arrays (one array for each chessboard configuration) nested inside of a larger array.

For the labels, in my case “good move” and “bad move”, I need to nest 2-digit arrays (one array for each chessboard label), inside of a larger array.

In this case, [1, 0] = good move and [0,1] = bad move. This kind of structure matches the “one_hot=True” structure of the original program.

The original program likely separates all of the labels out in this way, rather than using binary notation, to indicate that the labels aren’t correlated to each other.

In the chess case, the goodness or badness of a move is technically correlated, but I’ll stick with the one hot structure for now.

Part 3: Create the model

The next five lines of code are used to define the shape of the model.

# Create the model
  x = tf.placeholder(tf.float32, [None, 784])
  W = tf.Variable(tf.zeros([784, 10]))
  b = tf.Variable(tf.zeros([10]))
  y = tf.matmul(x, W) + b

In this case, there are no hidden layers. The function is simply mapping inputs directly to outputs, with no intermediate steps.

In the above example, 784 represents the size of an image and 10 represents the number of possible labels.

In the same way, for my chess program, 773 represents the size of a chessboard representation, and 2 represents the number of possible labels.

So, I can update the code, for my purposes, in the following way:

# Create the model
  x = tf.placeholder(tf.float32, [None, 773])
  W = tf.Variable(tf.zeros([773, 2]))
  b = tf.Variable(tf.zeros([2]))
  y = tf.matmul(x, W) + b

Of course, I’m skeptical that a model this simplistic will play chess at a sufficiently high level. So, I can modify the code to support my more sophisticated model, which has two hidden layers, first mapping the 773 bits of the chessboard to 16 interim values, which are then mapped to another 16 interim values, which are then finally mapped to the output array.

# Create the model
  x = tf.placeholder(tf.float32, [None, 773])
  W1 = tf.Variable(tf.zeros([773, 16]))
  b1 = tf.Variable(tf.zeros([16]))
  h1 = tf.matmul(x, W1) + b1
  W2 = tf.Variable(tf.zeros([16, 16]))
  b2 = tf.Variable(tf.zeros([16]))
  h2 = tf.matmul(h1, W2) + b2
  W3 = tf.Variable(tf.zeros([16, 2]))
  b3 = tf.Variable(tf.zeros([2]))
  y = tf.matmul(h2, W3) + b3

Part 4: Training the model

Now that we have the framework for our model setup, we need to actually train it.

In other words, we need to tell the program how to recognize if a model is good or bad. A good model does a good job approximating the function that correctly maps chess positions to evaluations, while a bad model does not.

# Define loss and optimizer
  y_ = tf.placeholder(tf.float32, [None, 10])# The raw formulation of cross-entropy,
  cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

For our purposes, we define a function called the Cross Entropy, which basically outputs how bad the model’s predictions are compared to the true values, which we use to test the quality of our model during training. For as long as the model is still bad, we use a mathematical technique called Gradient Descent to minimize the Cross Entropy until it’s below an acceptably small amount.

For implementation purposes, it’s not important to understand the math underlying either Cross Entropy or Gradient Descent. For my purposes this is both good and bad: It’s bad because I’m much stronger on the theoretical, mathematical side of machine learning versus the implementation side. It’s good because I’m forced to improve my abilities on the implementation side.

Part 5: Test the trained model

Once the model is trained, we want to test how well the model actually performs by comparing what the model predicts against the true labels in the dataset.

# Test trained model
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))

When the model outputs a prediction, it outputs a 10-digit array, where each digit of the array is a number between 0 and 1. Then, this array is fed into the function tf.argmax(y, 1), which outputs the label corresponding to the position in the array with the value closest to 1.

Part 6. Run the program

Finally, there’s some code that’s needed to make the program actually run:

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--data_dir', type=str, default='/tmp/tensorflow/mnist/input_data',
                      help='Directory for storing input data')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

Now that I’ve digested the entire program and convinced myself that I understand what’s going on, I’ll start playing around with it tomorrow and see if I can output any initial results.

Read the next post. Read the previous post.