Recognising Written Digits with Machine Learning

Published in

CodeX

5 min readAug 30, 2021

If you have any experience whatsoever with coding, then you should be familiar with ‘Hello World’ being the first thing you’re ever taught how to do. In the modern world of technology, machine learning is taking over and has so much further potential, but to start with, let’s introduce the ‘Hello World’ of artificial intelligence!

One of the fundamental artificial intelligence problems, is teaching a neural network how to see the world and its’ contents in the way you or I do. This is called computer vision. So rather than starting with getting the computer to identify every different object or person, we begin by teaching it how to see handwritten numbers. Doesn’t sound too interesting, but this is just the gateway to machine learning and neural networks!

What we will go through here is the most simplified code in python, to get a neural network to identify the number in a picture of a handwritten digit. It’s amazing how simple it is to achieve something like this now, and just with the small amount of code I will go through here, we can teach a neural network to classify 99% of images correctly!

Before jumping into the code, I will briefly introduce the underlying concepts needed for this project. If you have never been exposed to the mathematics behind forward and backward propagation, I highly recommend researching it, and also the different architectures that are available. For this we will be using a convolutional neural network (CNN), which is commonly used for computer vision, or natural language processing (NLP) problems.

CNNs work by taking an array of input data, and passing a number of filters over them, which apply mathematical operations. There are often a number of different filters that vary in size. After a number of convolutions, the resulting data is condensed and fed into ‘fully connected’ layers of the network, before a classification is made.

Again, I won’t be diving into the mechanics and mathematics of these complex structures, but a small amount of background is useful.

So now to why you’re probably here, the code! There are basically three parts to building any simple neural network models:

Manipulating the input data
Building and training the model
Evaluating the model

Luckily for this particular problem there is a free dataset called MNIST which contains 60,000 images of handwritten digits, and their labels. The model is built using tensorflow, which is the easiest library for entry level neural network building.

The packages we will need for this are:

import matplotlib.pyplot as pltfrom keras.datasets import mnistfrom tensorflow.keras.utils import to_categoricalfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

So to tackle the first aspect of making any successful model, importing and manipulating the data. We need to import the MNIST dataset first of all, and split it into training and testing, with inputs and outputs.

(x_train,y_train),(x_test,y_test) = mnist.load_data()

‘x_train’ and ‘x_test’ are our input data, while ‘y_train’ and ‘y_test’ are their corresponding values, i.e the number written in the image. We need to make sure that the input data array is the right shape for the CNN to deal with. Each image is 28x28 pixels, and is black and white, i.e has one color dimension, so the array shape of each image is (28,28,1). We also need to convert the response labels to ‘categorical’ data, meaning that it no longer holds a single value like ‘4’, and instead becomes a binary list, where the 1 takes the position to the value of the label. So for example a label of ‘4’ would become [0,0,0,0,1,0,0,0,0,0], or ‘2’ would become [0,0,1,0,0,0,0,0,0,0]. This seems quite arbitrary but this is how it is done when there are more than two possible classes to classify.

Anyway, so this is done like so:

x_train = x_train.reshape((x_train.shape[0],28,28,1))x_test = x_test.reshape((x_test.shape[0],28,28,1))y_train,y_test = to_categorical(y_train),to_categorical(y_test)

The last bit of data manipulation we need to do before building our model, is to normalize the input data. This is often used for input data to prevent gradient explosions during training (this will make more sense if you understand the maths of back propagation), and also makes predictions more consistent with the test data. Since the input data is black and white images, their values will be between 0 and 255, so to normalize the data easily all we need to do is divide every value by 255.

x_train = x_train/255.0x_test = x_test/255.0

Time to build our model! This is where things get interesting, and where you can really experiment. We will use two convolutional layers, two pooling layers, and two fully connected layers. You can play around with this to try and find a better model than the one I’m about to show you . As a backpropagation optimizer we are using statistical gradient descent (SGD). So here is the code to build our model, you’re about to see just how little work it takes to create a neural network like this.

model = Sequential()
model.add(Conv2D(32,(5,5),activation='relu',input_shape=(28,28,1)))
model.add(MaxPooling2D((3,3)))
model.add(Conv2D(64,(2,2),activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Flatten())
model.add(Dense(500,activation='relu'))
model.add(Dense(100,activation='relu'))
model.add(Dense(10,activation='softmax'))
opt = tensorflow.keras.optimizers.SGD(learning_rate=0.01,
                                      momentum=0.9)
model.compile(optimizer=opt,loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(x_train,y_train,epochs=20,
                    validation_data=(x_test,y_test),verbose=1)

Training machine learning models is inherently computationally intense. If your PC is a bit lacking a strongly recommend using google colab with the GPU runtime to speed things up a bit.

Now we just need to evaluate how effective our model is. With the ‘verbose’ set to 1 in the ‘fit’ method above, you will see all the results but plotting them makes understanding your model a lot easier. We will plot the accuracy of the training set and testing set one on plot, and the loss for the training and testing on another plot.

plt.plot(history.history['accuracy'],label='Training')
plt.plot(history.history['val_accuracy'],label='Testing')
plt.title('Accuracy')
plt.legend()
plt.show()plt.plot(history.history['loss'],label='Training')
plt.plot(history.history['val_loss'],label='Testing')
plt.title('Loss')
plt.legend()
plt.show()

If you followed along with all this code and run it for yourself, then you should end up with a neural network that can correctly classify handwritten digits 99% of the time! It’s so easy to tackle a problem like this in python now, anyone can do it.

I hope you found this introduction to neural networks interesting, and that you take this further and really experiment. If you’ve caught the machine learning bug, are proficient with calculus and coding, then I strongly recommend having a go at coding a feed-forward neural network from scratch, no packages! It will really give you a better understanding of how these incredible structures work.

Recognising Written Digits with Machine Learning

Written by Tom Clarke