7. Introduction to Deep Learning with Computer Vision — MNIST Handwritten digit recognition hands-on

Inside AI
Deep-Learning-For-Computer-Vision
8 min readOct 30, 2019

Written by Nilesh Singh & Praveen Kumar.

Prerequisites: Filters & Kernels, Channels & Features

In this blog, we will learn to code and create our very first neural network. The primary task of our network will be to demonstrate the fact that it works and as a bonus, it will also identify handwritten digits autonomously.

So let’s cut the chatter and talk code.

The first thing to bear in mind is that we will be using Python 3.x on a google Colab platform. We will be using Keras as our primary library for writing the network. It is a simple, elegant, yet extremely powerful API which runs on top of TensorFlow or Theano.

The first logical thing to do would be to fire up colab if not already done.

You should get a ‘Welcome to the Colaboratory’ page. On the top left corner, look for the ‘file’ option and go to file->New Python 3 Notebook.

Now as the notebook is up and running, let us look at shortcuts for some commonly used tasks that can prove instrumental while using the environment.

To run each cell individually: SHIFT+ENTER

To run all cells: CTRL+F9

To delete a cell: CTRL+M +D

To insert a cell below: CTRL+M+B

To insert a cell above: CTRL+M+A

The good news is, if you find it complicated to remember them, you can customize each shortcut using preferences setting or use CTRL+M+H to open up customization settings for all the shortcuts.

Let's build a neural network with 7 hidden layers

Click here to find the notebook with all the code used in this article.

In the first block, write this import statement and press shift+enter.

import keras

We will now import a few more modules. Don’t worry about understanding everything at one go, we will ease you into it as and when we use it.

import numpy as np

from keras.models import Sequential
from keras.layers import Flatten
from keras.layers import Convolution2D
from keras.utils import np_utils

from keras.datasets import mnist

Here the line worth noting is the last import statement. In the last statement, we are importing the whole MNIST dataset. It contains all the images and their labels which we will be using to train our handwritten digit recognition model. MNIST consists of 60k training and 10k testing images. All images are black and white and are 28X28 in size.

Now that we have all the required libraries and dataset in place. We will be splitting the dataset into training and testing parts. We will build a model that will be trained on the training set. It's like teaching a baby and asking it to learn how does an apple look like, the same goes for our model, which will learn by seeing multiple instances of handwritten digits and hopefully learn to classify them from 0 to 9.

Even though we are confident that our model has accurately learned all the digits, we need to gauge its learning by showing it some unseen images. If the model accurately classifies these new unseen images, we can confidently say, we nailed it!!!

Since we are using inbuilt dataset from Keras, it can be loaded and split into training and test sets with just one line of code.

##splitting MNIST dataset into training and testing sets, images are loaded into memory as well
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Here X_train and y_train contains the training and testing images respectively while X_test and y_test contain their labels and follow the same nomenclature. Also, all the variables created in this step are numpy arrays.

In the next step, we add a depth dimension of 1, which signifies that our data has only one channel. This is done using the reshape function of numpy arrays.

##Basically adding a depth dimension to the data, so our MNIST data are single channeled
X_train = X_train.reshape(X_train.shape[0], 28, 28,1)
X_test = X_test.reshape(X_test.shape[0], 28, 28,1)

Next, we scale our data from a range of (0–255) (think RGB) to the range of (0–1). why? Because scaling allows us to scale down all the images into a single ratio of the same proportion. It helps faster training and better accuracy prediction (sometimes counter-intuitive). So we divide our images by 255 because pixels in images of RGB color model can have a maximum value of 255(2⁸-1).

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Now, once we have scaled our images, we go on to convert our linear image labels to a categorical one. These labels are the ground truth to our model. Let’s just do this and not get into why and what. We will explain as we progress deeper into the sessions.

# Convert 1-dimensional class arrays to 10-dimensional class matrices(making the data categorical)
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

In the next code cell, we will define and create a very simple neural network. This may seem complex, and weird, but believe us, this is the simplest of architectures you would ever write. Let’s write the code and then try to understand it briefly.

from keras.layers import Activation, MaxPooling2D

model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(28,28,1))) #receptive field=3,input channel=1
model.add(Convolution2D(64, 3, 3, activation='relu')) #receptive field=5,input channel=32
model.add(Convolution2D(128, 3, 3, activation='relu')) #receptive field=7,input channel=64

model.add(MaxPooling2D(pool_size=(2, 2))) #receptive field=14

model.add(Convolution2D(256, 3, 3, activation='relu')) #receptive field=16,input channel=128
model.add(Convolution2D(512, 3, 3, activation='relu')) #receptive field=18,input channel=256
model.add(Convolution2D(1024, 3, 3, activation='relu')) #receptive field=20,input channel=512
model.add(Convolution2D(2048, 3, 3, activation='relu')) #receptive field=22,input channel=1024
model.add(Convolution2D(10, 3, 3)) #receptive field=24,input channel=2048

model.add(Flatten())
model.add(Activation('softmax'))

model.summary()

In the 2nd line, we are declaring a neural network and making it Sequential. Sequential simply means that the layers are connected to one another in a 1:1 ratio, which means, each layer takes input from the previous layer and its output goes to the next immediate layer. There are no connections that may lead to skipping of layers. It follows a cascaded layer design.

Then in the next line, we create a convolutional layer by using the Convolution2D() function and add it to the model using model.add().

In Convolution2D(32, 3, 3, activation=’relu’, input_shape=(28,28,1)):

  • 32 represents that we need 32 kernels
  • 3,3 means that the 32 kernels should be of size 3X3, so we are performing a 3X3 convolution here
  • Activations are basically non-linear functions that tell the network how data is passed between layers. Without its presence, all neural networks will be a simple linear regression model. The activation function ReLu used here is quite simple, it sends all the positive values over the layer and discards all negative values.

We repeat these layers, linking one to the next.

Towards the end, we add a Flatten() layer which basically converts the output of the final layer into a 1D array of 10 numbers (because there are 10 categories in our data [numbers from 0 to 9]).

When the last statement is executed, we get a nice and structured summary of the model with details of each layer, something very similar to this.

Summary of the model created

We just created an architecture of the model. Now let us complete the process by compiling it.

##we are compiling our model using adam as optimizer
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])

We have some loss metrics which will be used by our model in ‘backpropagation’ (watch this wonderful video), so that every time our model learns (predicts) the wrong label (ground truth), we punish it by giving it a penalty. We have accuracy as metrics which is essentially just our performance parameter based on which we assess the model. As it so happens, after every epoch, we are monitoring and judging the performance of the model using the accuracy of its prediction. Let’s ignore optimizers for the time being since we will cover them deeply in upcoming sessions. If you are keen then please do read about them here.

With our model properly compiled, we say our prayers to the heaven and with enormous gratitude to all the old gods and new, we finally train our model.

##Training the model in this cell
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)

Finally, we are doing some training??

We train the model for 10 epochs i.e. we go through all the 60k training images 10 times over in the hope that our model will learn all the numbers from 0 to 9.

Training process should take around 15–20 mins, and it should look something like this.

The accuracy printed at the far right of every line after each epoch is our training accuracy. These values actually play a monumental role in diagnosing the network but should not be looked upon as final accuracy.

Let's move on…take a test of our model. Let’s test it.

##Evaluating the trained model on test data
score = model.evaluate(X_test, Y_test)

##predicting the labels of test dataset
y_pred = model.predict(X_test)

This should print the score of the model. It should be somewhat around 99.2%, not all bad for the first network.

Please note that we have deliberately left explanations for some functions and statements as we have not built up the base to understand this; hopefully we will cover them in near future.

Congratulations!!!

You made it!!!!!

You are a tough cookie and you nailed it. You deserve a pat on the back.

Get some rest!!!

Hope you enjoyed it.

Interesting comment: Do check out your model’s summary and try different kernel values and let us know how many parameters you are training today.

NOTE: We are starting a new telegram group to tackle all the questions and any sort of queries. You can openly discuss concepts with other participants and get more insights and this will be more helpful as we move further down the publication. [Follow this LINK to join]

--

--

Inside AI
Deep-Learning-For-Computer-Vision

We write about NLP, Speech Recognition, Computer Vision, Kaggle, and Data Science Competitions.