NUMBRE — A NUMBer REcognizer Neural Network

My machine learning journey began back in 2015 when I took Andrew Ng’s machine learning course through Coursera. I only got a couple weeks into the course before I stopped. The following year, I redid the course and again, fell off the tracks. I resolved in 2017 that come hell or high water, I would finish the course. Which I did!

At the end of Andrew’s course, I had a variety of different machine learning concepts down and a bunch of MATLAB code under my belt as part of my assignments. And then I saw this…

Besides making me laugh, that video got me thinking about how to start applying the machine learning skills I had learnt so far. Since I was looking for an excuse to start using TensorFlow — Google’s open source machine learning framework, this seemed like a great way to gain that experience.

Having no prior experience with TensorFlow, I decided to redo my first assignment in Andrew Ng’s course — a neural network trained on the MNIST dataset.

In this article, I’ll discuss how I implemented NUMBRE — a neural network that recognizes handwritten numbers. NUMBRE stands for NUMBer REcognizer and is named as such because it sounds cool :)


For this project, my goal was to implement a deep neural network. The idea is that as each handwritten number is passed through the network, it makes a prediction as to what that number is. If that prediction is wrong then the previous layers are modified in order to minimize that error. Given enough examples and time the network should be able to “learn” what different numbers look like. More importantly, it should also be able to correctly identify examples that it has never seen before!

Although TensorFlow is a very powerful tool, for novices it can be a bit overwhelming to start using it. As such, I used Keras, a high level API that runs on top of TensorFlow and other popular machine learning libraries. Keras made it really easy to implement the different layers of my neural network using a few lines of code.


My first step was to import the Sequential model and the MNIST dataset.

from keras.models import Sequential
from keras.datasets import mnist

Keras offers a number of models to implement different types of neural networks. In this case all the network needed was 2–4 layers so I chose the Sequential model. This made it easy to create and define each layer of the neural network and stack them together.

Image result for neural network
The Sequential model makes it easy to implement each layer of the neural network

There’s no point having a deep neural network if there’s no data to train it on and that’s where MNIST comes in. MNIST is a database that contains thousands of handwritten numbers (0–9). Since it has a lot of samples my first step was to split the imported dataset into two categories. The first would be used for training the neural network and the second was to test it.

#get the MNIST dataset
#split the dataset into training and testing datasets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#plot the first 4 images
for i in range(1,5):
plt.imshow(x_train[i]) plt.show()

x_train/x_test contained each handwritten digit and y_train/y_test stored the number that was written down as an integer. I also wanted to plot the first four images just to make sure nothing was odd with the dataset I had imported. Looks good so far!

MNIST images

As shown in the image below each handwritten digit can be thought of as a matrix that stores the value of each pixel. The areas that are white will have a value of 0 with anything darker having a progressively higher value.

The number 8 represented as a matrix. The pixel intensity values in the matrix range from 0–256, with 0 being white and 256 representing black.

Each number matrix needs to be “unrolled” into a linear vector before being passed to the network’s input layer. Since all the matrices are 28x28 pixels, when “unrolled” each vector contains 784 values. The network uses these 784 values for input layer.

#each matrix in the training/test set needs to be “unrolled”
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)

The last step prior to defining the neural network was to convert the outputs to one hot encoding.

#convert train and test outputs to one hot encoding
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

I previously mentioned that y_test and y_train both contained numbers that corresponded with the handwritten numbers in x_train and x_test. Without getting into the details, one hot encoding converts each of those numbers into a linear vector. This would be important when determining the prediction error.

The number four as a vector

At this point I was ready to start implementing the neural network!


The first step was to define a new Sequential object.

model = Sequential()

As I mentioned previously, only a single line of code is needed per layer of the neural network. The input layer was added first with the dimensions and activation function set to 784 and “relu” respectively.

model.add(Dense(25, input_dim = 28*28, activation = ‘relu’))

For the dimensions of the first layer, recall that each handwritten digit was stored as a 28x28 matrix that was unrolled into a 784 vector. It is this vector that is passed as the input to the network.

Lastly, “relu” was set as the activation function. When the nodes of the neural network receive a signal, an activation function is what determines whether that signal is passed on or not. In essence this is how the network “learns” how to recognize the different numbers. This process analogous to how the neurons in our brain’s pass signals to each other!

With the input layer set, the other two layers were defined with the second layer having 256 nodes and the third having 300 nodes. I tried a couple different sizes and these were the two that seemed to work the best.

model.add(Dense(256, activation = ‘relu’))
model.add(Dense(300, activation = ‘relu’))

I’m sure you’ve been sitting on the edge of your seat so far but the last layer is where it gets really exciting!

You may have noticed that the last hidden layer had 300 nodes. But I needed the network to make a prediction between 0–9. Hopefully, that prediction would be the same as the handwritten number that was used as an input.

As such, I set the output layer to 10 nodes and used Softmax for the activation function instead of “relu”.

model.add(Dense(10, activation = ‘softmax’))

Softmax regression allows for multiclass classification to occur by assigning a probability to the most likely output.

Based on the inputs the Softmax regression will assign a probability to the most likely output

With the layers all set there are two more things that need to be done before the network can be applied to the training data. The first, is to compile the model.

#model.compile configures the learning process
model.compile(loss = ‘categorical_crossentropy’, optimizer= ‘adam’, metrics= [‘accuracy’])

Compiling the model sets the parameters for the training process and three things need to be defined, a loss function, an optimizer and the metrics.

The loss function is an important component of training the network. After each batch, the neural network makes a prediction and the loss function measures the inaccuracy of that prediction.

Using the results of the loss function the optimizer determines how to tweak the weights in the previous layers such that the error is minimized. For those of you that have some experience with machine learning this is equivalent to backpropagation.

I’m not sure quite sure what metrics is supposed to do but it was in the Keras documentation so it’s good enough for me. If I had to guess, by setting “accuracy” as a parameter it ensures that accuracy is prioritized. Feel free to chip in the comments to provide some clarification.

With everything set up we can finally train our neural network!

I’m that excited

At this point all that needs to happen is pass the training variables to the model along with the batch size and epochs.

#epochs is the number of passes through the training data
#batch size is the number of samples that are sent through the network
model.fit(x_train, y_train, epochs = 20, shuffle= True, verbose = 2, batch_size= 128)

Batch size simply means that the neural network is trained on 128 examples at a time. Without getting into the details, the idea is that the network updates itself i.e. “learns” as each batch of samples passes through it.

The number of epochs refers to the number of times the neural network passes through the data. In this case it passes through the entire training dataset 20 times.

And with that, I had a trained neural network!

Now that the network has been trained I had to use the examples in the test data. Using examples that the network had never seen before was important for a couple reasons.

1) A neural network can work really well on the training data but be awful on examples it’s never seen before.
2) It determines if the neural network is able to generalize. A network that is able to generalize means that it should be effective at recognizing numbers in the “real world”.
#run neural network on test data
test_error_rate = model.evaluate(x_test, y_test, verbose = 0)
print(model.metrics_names)
print(test_error_rate)

Again, Keras made evaluating the error rate really easy as it took one line of code. It looked like all that training had paid off since NUMBRE had 98% accuracy!

I’m just a fan of Super Troopers. Also, who doesn’t appreciate a great moustache?

The last thing to do was to save the weights so that they could be reused.

model.save("trainedMNISTModel.h5")
for i in range(0, 10): 
prediction = model.predict(x_validate[i].reshape(-1, 28 * 28))
plt.imshow(x_validate[i])
plt.ylabel(“Predicted Value: “ + str(np.argmax(prediction)))
plt.xlabel(“Actual Value: “ + str(y_validate[i]))
plt.show()

You can see the results below!

I don’t have the best phone camera so it’s a bit fuzzy

In the middle we have the handwritten digit that NUMBRE has never seen before. On the y axis is the predicted value and on the x-axis is the actual value that was written down. Looks like that 98% accuracy rate was right after all! 😂 😌😂

And that’s how I built NUMBRE — a neural network that can identify handwritten numbers!


At the start of this project I thought working with Keras and by extension TensorFlow would be a bit of a struggle. However, this project was really straightforward and I’d liken it to the machine learning equivalent of “Hello world”. A key part of why this was the case was Andrew Ng’s coursera course as it provided me with a solid foundation. Additionally, Keras had great documentation in addition to being very user friendly.

That being said, my main challenge was translating what I had learnt from Andrew’s course to Keras/TensorFlow. In Andrew’s course, I wrote code that covered a variety of techniques such as forward/backpropagation, linear regression, gradient descent and much more. More importantly, that code ran through each component of these different techniques.

That approach was great for me since I could conceptualize and implement everything step by step.

Implementing backpropagation step by step in MATLAB

Using Keras however, all this effort isn’t needed. That entire bit of code above can be abstracted to just one line of code shown below.

model.compile(loss = ‘categorical_crossentropy’, optimizer= ‘adam’, metrics= [‘accuracy’])

As such, learning how translate that step by step code into one or two lines using Keras was a challenging but ultimately a very rewarding part of this project!


To check out all the code seen in this article head over to my GitHub.

And to read more about my different projects just click the link below….

…or drop by my website to check out my research!

Thanks for reading :) If you have any comments or suggestions about this article feel free to leave a comment!