Understanding Digit Recognition Using Pytorch

5 min readMay 25, 2019

Prerequisites

Before diving into the code, one must have a basic knowledge of neural nets and matrices. Also, you should have numpy and pandas installed on your machine. You can download them from the links below:

numpy

NumPy is the fundamental package for array computing with Python.

pypi.org

PyTorch

An open source deep learning platform that provides a seamless path from research prototyping to production deployment.

pytorch.org

Saying so, let’s get started!

Understanding the data

The MNIST dataset is used was created by Yann LeCun (director of AI research-Facebook). This model has been in use a lot and helps in training a machine to recognize handwritten digits. It contains handwritten 28X28 pixel digits. The MNIST database contains 60,000 training images and 10,000 testing images.

Note that the model is trained on images with a black background and white text as below.

So the model won’t work well if the images don’t meet these criteria of the background and text color.

For this either the images can be preprocessed or the dataset can be processed or changed to include variations.

Also, different variations of each digit are also available as the letters are handwritten.

For eg:

For digit 0, some of the variations are

Hence, this would create a better trained model.

Code

The first step is to import all the libraries used in the code. Note that the torch imported here is the pytorch library.

Device variable stores the physical component (CPU or GPU (if available) ) in which to perform training.

The second step is to define the hyperparameters. It is a crucial step while creating a neural network and one might need to tweak the values (mostly the learning rate with respect to the error) to get a better performing neural net.

The input size is taken to be 784 as the image is 28 by 28 pixel (28*28=784). Each pixel acts as a node in the input layer.

*Feedforward mechanism in a neural network*

We have taken two hidden layers in our model, first of size 500 and second of size 100.

The final output layer will contain 10 nodes where each node is used for each digit from 0 to 9. The node of the digit which outputs the maximum value is the predicted digit.

Next step is to load and make the data ready to be fed into the neural network. For that, we convert the images to tensors.

The training dataset contains 60,000 images of the handwritten image data whereas the test dataset contains 10,000 such images.

The NeuralNet class is the model created for the forward propagation of our neural network. In this case, we have created a neural network with an input layer, two hidden layers of sizes 500 and 100 respectively and an output layer of size 10 (digits 0 to 9). Note that we have created a linear model that is of the form y=Ax+b.

The following steps define the working of the model (forward propagation):

the model is loaded in the input layer

2. ReLU activation function is performed on the input layer
(y=x for x>=0;
y=0 for x<0)

3. The model is treated with weights and passed to the 1st hidden layer

4. the model is treated with the next weight matrix and passed on to the 2nd
hidden layer

5. the model is treated a final time with the next weight matrix to get the final
output layer of size 10

The calculation of loss is done using cross entropy loss. Cross entropy loss is a logarithmic loss and the value received by the cross entropy loss is a probability value between 0 and 1.

*Backpropagation mechanism in a neural network*

Adam optimization algorithm is quite a good optimization algorithm and is used here as the choice for optimization. This method is where the training is taking place. The weights are updated through the optimization algorithm to give a model with less error as compared to before to learn the task at hand.

Epoch is the number of times the entire data is shown to the neural network. This is done so as to get the best fit of the data. The data might be underfitting in the initial stage but as the number of epochs increases, the data becomes closest to the best fit. Note that if the number of epochs is high, then the data might overfit.

Within each iteration of epoch runs the training iteration. The following processes are taking place:

Images are reshaped into a 28*28 column tensor and moved to the configured device.
Labels are moved to the configured devices (labels are the true values of the handwritten digits).
The images are fed into the model.
Loss is checked according to the criterion set above (cross entropy loss).
Backward propagation for the optimization of the model (or weights) is performed (Notice that we set optimizer to zero grad. This is because in pytorch, the gradients are accumulated and we need to set gradients to zero to calculate the loss).

The final part is getting the test score. For that, we check the output of the model when fed into the model vs the original label.

The correct values are counted and the percentage of correct values are printed.

The trained model is saved as model.ckpt and this trained model can be used in the future.

Testing on an image outside the dataset

After processing the image to meet the criteria of black background color and white text, we check the model on a foreign data image:

Image:

Output:

tensor([4])

Hence it can be concluded that the model is working. To get the full working code of the above model, follow the github link.