Get started with your first Deep Learning project

Published in

Analytics Vidhya

7 min readJul 15, 2020

A simple approach to develop a handwritten digit classification model using MNIST dataset with a simple Neural Network

Deep learning can be considered as a subset of Machine Learning where the model learns and improves on its own by examining computer algorithms.

Deep learning models can be visualized as a set of points each of which makes a decision based on the inputs to the node, very similar to the neurons present in the human brain. Thus these nodes are called artificial neurons and a network of such nodes is called a Neural Network. The layers in these networks can perform complex operations such as representation and abstraction that make sense of images, sound, and text.

Deep learning models are widely used to solve Computer Vision tasks and allow a computer to see and visualize as a human would. In this tutorial, we will learn the basic flow of a deep learning project and we will classify handwritten digits using the Keras Python API with TensorFlow as the backend.

MNIST Dataset

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. This dataset is considered to be the “hello world” dataset for Computer Vision.

It has a training set of 60,000 examples and a test set of 10,000 examples for handwritten digits with a fixed dimension of 28X28 pixels. The goal is to correctly identify digits and find ways to improve the performance of the model. I have used the Google Colab environment as it has a simple interface and does not require and environment set up. So let’s dive into it -

First, we load the required libraries as follows

import numpy as np
import matplotlib.pyplot as plt
import random
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils

NumPy is an advanced Math Library in Python. Matplotlib will be used to plot graphs and for data visualization. We will import the MNIST dataset which is pre-loaded in Keras. We will use the Sequential Model and import the basic layers and util tools.

Load the dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()

We load the dataset and split them into the Training and Testing set. We can print a few of the images from the dataset to verify the classes and input.

Printing a few input images along with their respective class

Data Pre-Processing

Data preprocessing refers to the technique of preparing (cleaning and organizing) the raw data to make it suitable for a building and training Deep Learning models. It is a crucial step that helps enhance the quality of data to promote the extraction of meaningful insights from the data.

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

Instead of a 28 x 28 matrix, we build our network to accept a 784-length vector. Pixel values range from 0 to 255 where 0 is black and 255 is pure white. We will normalize these values by dividing them by 255 so that we get the output pixel values between [0,1] in the same magnitude.

Note that we are working with grayscale images of dimension 28 x 28 pixels. If we have color images, we have 3 channels for RGB, i.e. 28 x 28 x 3, each with pixel value in the range 0 to 255.

print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)

We check the dimension of the Training and Test sets after data pre-processing to make sure there is no inconsistency.

Dimensions of Training and Testing matrices after pre-processing

no_classes = 10
Y_train = np_utils.to_categorical(y_train, no_classes)
Y_test = np_utils.to_categorical(y_test, no_classes)

Since the output will be classified as one of the 10 classes we use the one-hot encoding technique to form the output (Y variable). Read more about one-hot encoding here.

Building the 3-Layer Neural Network

We will build a simple neural network with 3 Fully Connected Layers. Fully Connected layers in a neural network are those layers where all the inputs from one layer are connected to every activation unit of the next layer.

model = Sequential()

The sequential API allows you to create models layer-by-layer.

First Hidden Layer

model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))

The first hidden layer has 512 nodes (neurons) whose input is a vector of size 784. Each node will receive an element from each input vector and apply some weight and bias to it. In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. ReLU stands for Rectified Linear Unit and is a type of activation function. Read more about activation functions here.

Second Hidden Layer

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))

The second hidden layer also has 512 nodes and it takes input from the 512 nodes in the previous layer and gives its output to the next subsequent layer.

Final Output Layer

The final layer of 10 neurons in fully-connected to the previous 512-node layer. The final layer should be equal to the number of desired output classes.

model.add(Dense(10))
model.add(Activation('softmax'))

The Softmax Activation represents a probability distribution over 1 different possible outcomes. Its values are all non-negative and sum to 1.

For example, if the final output is:

[0, 0.94, 0, 0, 0, 0, 0, 0.06, 0, 0]

then it is most probable that the image is that of the digit 1

Model Summary

model.summary()

This is a very simple way to view the structure of the entire neural network at a single glance. It is used to rectify any inconsistencies with the layers and we can also see the number of trainable and untrainable parameters.

Compiling the model

When compiling a model, Keras asks you to specify your loss function and your optimizer. Read more about loss functions here.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The loss function we’ll use here is called categorical cross-entropy and is a loss How to Configure the Number of Layers and Nodes in a Neural Network function well-suited to comparing two probability distributions. The cross-entropy is a measure of how different your predicted distribution is from the target distribution.

Optimizers are algorithms or methods used to change the attributes of the neural network such as weights and learning rate to reduce the losses. Optimizers are used to solve optimization problems by minimizing the loss function. In our case, we use the Adam Optimizer.

Training the model

This is where the fun begins!

model.fit(X_train, Y_train, batch_size=128, epochs=5, verbose=1)

The batch size determines how much data per step is used to compute the loss function, gradients, and backpropagation. Large batch sizes allow the network to complete it’s training faster; however, there are other factors beyond training speed to consider. Too large of a batch size smooth the local minima of the loss function, causing the optimizer to settle in one because it thinks it found the global minimum. Too small of a batch size creates a very noisy loss function, and the optimizer may never find the global minimum. So a good batch size may take some trial and error to find!

Here’s the output —

Note that the accuracy increases after every epoch. We need to have a balanced number of epochs as higher epochs come at the risk of overfitting the model to the training set and may result in lower accuracy in the test case.

Evaluating the model

We will now evaluate our model against the Testing dataset of 10,000 images.

score = model.evaluate(X_test, Y_test)
print('Test score:', score[0])
print('Test accuracy:', score[1])

We get a test accuracy of 98.4% which is fairly good for the first attempt.

Find the Python notebook with the entire code along with illustrations here.

Conclusion

Congratulations on completing your first Deep Learning model. I hope you understood the basic concepts behind data pre-processing, model framing, training, and testing.

There are many ways in which we can improve the performance of the model by tuning the hyperparameters, data validation, augmentation, trying different optimizers and avoiding biased training, and many more!

In the next blog, we will delve into Convolutional Neural Networks and how we can use ConvNets to give better results for classifying the handwritten digits from the MNIST dataset, so stay tuned for that. Do reach out if there are any doubts/queries/suggestions. HAPPY LEARNING :)