Using CNN to Classify Handwritten Digits with TensorFlow and Keras

Jaimin-k
4 min readSep 12, 2021

--

To build a simple neural network and train it to recognize handwritten digits using the MNIST dataset.

The steps involved are:

  1. Loading the dataset
  2. Processing the data
  3. Building and compiling of the model
  4. Training and evaluating the model

Loading the data:

The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits (0 to 9). The database contains 60,000 training images and 10,000 testing images, each of size 28x28.

import keras
from keras.datasets import mnist
#load mnist dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

Here X_train contains 60,000 training images’ data each of size 28x28 and y_train contains their corresponding labels. Similarly, X_test contains 10,000 testing images’ data each of dimension 28x28 and y_test contains their corresponding labels. Let’s visualize few data from training to get a better idea about the purpose of the deep learning model.

dataset samples

Processing the data

Resizing images: In order to train our neural network to classify images, we first have to unroll the height x width pixel format into one vector as the input vector. So its length must be 28 x 28 = 784.

Normalization of input: The pixel values range from 0 to 255: the background majority close to 0, and those close to 255 representing the digit.

Normalizing the input data helps to speed up the training. Also, it reduces the chance of getting stuck in local optima, since stochastic gradient descent was used to find the optimal weights for the network.

One-hot encoding: The categories — digits from 0 to 9 are encoded — using one-hot encoding. One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction. The result is a vector with a length equal to the number of categories. The vector is all zeroes except in the position for the respective category. Thus, ‘7' will be represented by [0,0,0,0,0,0,0,1,0,0]

Building & compiling the network

After the data is ready to be fed to the model, we define the architecture of the model and compile it with the Adam optimizer function, cross-entropy loss function, and its performance metrics.

Model Architecture

The last layer consists of connections for our 10 classes.

def leNet_model():model = Sequential()model.add(Conv2D(30, (5, 5), input_shape=(28, 28, 1), activation='relu'))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Conv2D(15, (3, 3), activation='relu'))model.add(MaxPooling2D(pool_size=(2, 2)))model.add(Flatten())model.add(Dense(500, activation='relu'))model.add(Dense(num_classes, activation='softmax'))# Compile modelmodel.compile(Adam(lr = 0.001), loss='categorical_crossentropy', metrics=['accuracy'])return model
model summary

Training the model:

A dropout layer was added to reduce the overfitting and for the lines to converge, but the model performed quite poorly and was not able to predict the correct class accurately.

Evaluating the Model’s Performance

The performance of the model was checked on images with handwritten digits and was able to predict the correct class of the digit.

The model also gave a few wrong predictions, as in some cases the digits were blurred to a high extent and difficult to interpret even for a human reader.

--

--

Jaimin-k

Autonomous Vehicles Enthusiast! | Computer Vision | Machine Learning