Implementing VGG13 for MNIST dataset in TensorFlow

Amir Hossein
5 min readFeb 23, 2018
  • This is a new implementation of VGG13 Convolutional Neural network with the new TensorFlow APIs (TensorFlow > v1.4)

In this project we are going to:

  1. Prepare the environment (e.g., importing the required libraries)
  2. Load a dataset to train on (MNIST dataset)
  3. Implement a fully functioning ConvNet using TensorFlow
  4. Train the implemented model on the dataset
  5. Analyze the results

1. Preparing the environment

We import the numpy, TensorFlow, and matplotlib libraries to do some graphing.

2. Load the MNIST dataset:

The MNIST dataset contains handwritten digits in grayscale. Some sample images are shown below:

MNIST dataset

This dataset is built-in in the TensorFlow. Using TF APIs we can easily load the train and eval/test MNIST data:

To check if the dataset has been loaded properly, you can plot a random index from the train set using the code snippet below:

this shows the below image which is a 0:

Now, let’s see some statistics about the dataset we just loaded:

number of training examples = 55000
number of evaluation examples = 10000
X_train shape: (55000, 784)
Y_train shape: (55000,)
X_test shape: (10000, 784)
Y_test shape: (10000,)

To summarize, we have 55K training images, each with a size of 28 x 28. Each image is flattened when loading the dataset meaning, each row of training data which contains one single image is a very long vector of size 28*28=784. Later in the implementation we see that the input to our network is defined as [28, 28]. When defining the input tensor, Tensorflow automatically reshapes each vectorized image to [28,28] to carry out the convolution operations. The same thing applies to our eval/test dataset which contains 10K image.

3. Implement the full VGG13 CNN:

For simplicity of this tutorial we implement the vgg13. Following the same logic you can easily implement VGG16 and VGG19. From the original VGG paper the architecture for VGG13 is described along others in a table:

VGG13 is model B in the above table. For the sake of this tutorial and to get a better picture of the network. I have drawn the network in the block diagram fashion:

As you see the model follows the sequence of operations as below:

2 x CONV2D(64) -> RELU -> MAXPOOL -> 2 x CONV2D(128) -> RELU -> MAXPOOL -> 2 x CONV2D(256) -> RELU -> MAXPOOL -> 2 x CONV2D(512) -> RELU -> MAXPOOL -> 2 x CONV2D(512) -> RELU -> MAXPOOL -> FLATTEN -> FC1(4096) -> FC2(4096) -> FC3(1000) -> output

Notes: each CONV2D is a 3x3 filter with stride = 1, and same padding. - each max pooling layer is 2x2 kernel with stride = 2 and same padding.

we implement the VGG13 model in a function calledcnn_model_fn(features, labels, mode) and later pass it to our estimator object to train.

In TF, there are built-in functions that carry out convolution, maxpooling, and FC steps:

  • tf.layers.conv2d(input, num_filters, filter_size, padding='same', activation=tf.nn.relu): given an input and a group of filters, this function convolves filters on the input.
  • tf.layers.max_ppoling2d(input, pool_size, strides, padding='same'): given an input, this function uses a window of size pool_size and strides to carry out max pooling over each window.
  • tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k].
  • tf.layers.dense(input, units, activation=tf.nn.relu): given a the flattened input, it returns the output computed using a fully connected layer. The `units’ determine the number of neuron in the FC layer.

Now let’s get our hands dirty:

Note: the output of each layer is the input for the next layer!

4. Training:

First, we create an estimator object and name it mnist_classifier. We inject our model cnn_model_fn to this estimator and define a path to save training checkpoint in case something happens and you want to resume the training.

Next, we define training parameters and hyperparameters for our network including inputs, labels, batch size, and number of epochs. We also shuffle the data before training.

Then we train our estimator object we created with the input function and properties we just defined.

For testing, we define an evaluation input function and propoerties and use our mnist_classifier estimator object in evaluation mode.

Here is the result of training:

INFO:tensorflow:loss = 2.3024457, step = 1
INFO:tensorflow:global_step/sec: 33.3624
INFO:tensorflow:loss = 2.3024359, step = 101 (2.999 sec)
INFO:tensorflow:global_step/sec: 34.6312
INFO:tensorflow:loss = 2.3019745, step = 201 (2.887 sec)
INFO:tensorflow:global_step/sec: 34.5752
INFO:tensorflow:loss = 2.3018668, step = 301 (2.893 sec)
.
.
.
.
INFO:tensorflow:loss = 0.024871288, step = 54901 (2.920 sec)
INFO:tensorflow:Saving checkpoints for 55000 into /tmp/mnist_vgg13_model/model.ckpt.
INFO:tensorflow:Loss for final step: 0.007031409.
INFO:tensorflow:Starting evaluation at 2018-02-22-01:01:23
INFO:tensorflow:Restoring parameters from /tmp/mnist_vgg13_model/model.ckpt-55000
INFO:tensorflow:Finished evaluation at 2018-02-22-01:01:24
INFO:tensorflow:Saving dict for global step 55000: accuracy = 0.9842, global_step = 55000, loss = 0.055481646
{'accuracy': 0.9842, 'loss': 0.055481646, 'global_step': 55000}

You can see that we have successfully trained a VGG13 on MNIST dataset and we achieved an accuracy of 98.42%. You can refer to the Jupyter notebook in this repository to see more in depth execution of the code.

If you found this tutorial helpful, please star the repository and leave some claps here :) . This would mean a lot to me and would encourage me more to post more tutorials.

--

--