MNIST dataset using Deep Learning algorithm (ANN)

6 min readMay 9, 2020

Introduction

Humans considered being the most intelligent species on the entire earth. Humans can perceive things, learn from it, and take necessary actions based on their previous learning. To build a powerful machine as same as humans, many researchers tried to mimic the human brain and came up with a concept called Neural Network. Although it exists for a few decades, it began to gain popularity recently due to the availability of data storage and computational power, as Neural Network needs loads of data and computational power to train the models.

Motivation behind Deep Learning

One question comes to my mind, why we need to mimic the human brain when we have a human brain itself. The below scenario will answer this:

Suppose if we give any human a task to classify images as a dog and a cat. Based on the previous learning, any human brain can classify it. But if we asked to classify millions of images, it would take forever to classify the same. Hence, the researchers came up with a solution to perform the same task in a short period.

In this tutorial, we will be talking about Deep Learning intuition, how to classify MNIST dataset images using Deep Learning algorithms such as Artificial Neural Network and Convolution Neural Network with various areas of improvement such as cross-validation, regularization, and so on. So, let’s get the shit done!

Understanding Deep Learning

Deep learning is the subset of the Machine Learning domain to identify features or patterns within the data. For example, in image classification, the upper layers extract the generic features and lower layers uncover the specific features of the image.

Other applications such as recommendation system, stock price trends, etc can be done using CNN, RNN using LSTM, Boltzmann machine, and so on. Let’s understand the neural network in detail. I will try not to include any mathematical formulas to keep the tutorial as simple as I can.

Figure 1: Deep Neural Network with 1 Hidden Layer

The first layer in the above image is the input layer where 1 row from the dataset goes into the network at a time to update the weights (will discuss this soon). Once every row passed through the network, it is called an ‘epoch’. Multiple epochs required to train any model. The number of epochs depends on the data/model/loss function used in the model.

The second layer act as a dense/hidden layer. Here, for the simplification purpose, I added 1 layer only, there could be multiple layers as well. These hidden layers find the patterns within the data by updating the weights after every batch-size passed in the model.

The last layer is the output layer which predicts the values for the unseen data (data which is not passed during the training of the model).

This is the very intuitive level of information just to give you a flavor of the flow of Deep Learning Neural Network. Please read the Keras documentation here for more information as these pieces of information are out of scope for this tutorial.

Image classification using Deep Learning

For this tutorial, we are going to use MNIST Fashion Dataset. Fashion-MNIST is a dataset of Zalando’s article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

The class labels are:

Figure 2: Unique classes in MNIST Fashion dataset

Import the required python libraries for Image Classification

#install required libraries
import pandas as pd
import numpy as np#data visualization packages
import matplotlib.pyplot as plt#keras packages
import keras
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.layers import Dropout#model evaluation packages
from sklearn.metrics import f1_score, roc_auc_score, log_loss
from sklearn.model_selection import cross_val_score, cross_validate

Load an MNIST Fashion dataset from the Keras.dataset library

#read mnist fashion dataset
mnist = keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

(60000, 28, 28) (60000,) (10000, 28, 28) (10000,)

You can import the dataset in many ways:

Download the dataset from here and import it manually.
Import the dataset directly from the Keras.dataset library (which is demonstrated above).

Data Preparation

It is an important step in the model building process. Before training the model, various data preparation steps need to be performed like:

Handling missing values (if any)
Feature Scaling(mandatory step for any deep learning model)
Reshaping data (model specific)
Lable encoding and One Hot encoding
Split the dataset into training and testing dataset

In our case, not all option(s) are applicable. Below are the steps are given to prepare our data according to the model needs.

#reshape data from 3-D to 2-D array
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)#feature scaling
from sklearn.preprocessing import MinMaxScaler
minmax = MinMaxScaler()#fit and transform training dataset
X_train = minmax.fit_transform(X_train)#transform testing dataset
X_test = minmax.transform(X_test)print('Number of unique classes: ', len(np.unique(y_train)))
print('Classes: ', np.unique(y_train))

Number of unique classes: 10
Classes: [0 1 2 3 4 5 6 7 8 9]

Data Visualization

Generally, it is important to understand the data before building a model. Hence, visualizing the data is one of the best approaches to uncover any pattern within the features by using scatter, boxplot, and so on. In our case, we can visualize the images to see how the images can distinguish from one another.

fig, axes = plt.subplots(nrows=2, ncols=5,figsize=(15,5))          
ax = axes.ravel()
for i in range(10):
    ax[i].imshow(X_train[i].reshape(28,28))
    ax[i].title.set_text('Class: ' + str(y_train[i]))              
plt.subplots_adjust(hspace=0.5)                                    
plt.show()

Figure 3: Displaying the first 10 images with the corresponding class information

Building an ANN model with 1 Dense Layer

#initializing CNN model
classifier_e25 = Sequential()#add 1st hidden layer
classifier_e25.add(Dense(input_dim = X_train.shape[1], units = 256, kernel_initializer='uniform', activation='relu'))#add output layer
classifier_e25.add(Dense(units = 10, kernel_initializer='uniform', activation='softmax'))#compile the neural network
classifier_e25.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])#model summary
classifier_e25.summary()

There are loads of things happening above. Let’s discuss this one by one in details (bring a cup of coffee if you want):

The Sequential class: Sequential groups a linear stack of layers into a tf.keras.Model.

The add() method: Adds a layer instance on top of the layer stack.

The Dense class: It adds the hidden layer in our network.

The compile() method: Configures the model for training.

The summary() method: Prints a string summary of the network.

Note: Click on each title to learn more.

Training an ANN model

The fit method is used to train the model.

#fit training dataset into the model
classifier_e25_fit = classifier_e25.fit(X_train, y_train, epochs=25, verbose=0)

Figure 4: Training accuracy and loss graph

Note: some part of the code is not shown here. I have used the matplotlib library to build the above graphs.

As we can infer from the above graph that the training accuracy is more than 94%. As we can see the upward trend at the end of the 25th epoch, we can increase the training accuracy by using more epochs to train our dataset.

Evaluation of ANN Model

#evaluate the model for testing dataset
test_loss_e25 = classifier_e25.evaluate(X_test, y_test, verbose=0)#calculate evaluation parameters
f1_e25 = f1_score(y_test, classifier_e25.predict_classes(X_test), average='micro')
roc_e25 = roc_auc_score(y_test, classifier_e25.predict_proba(X_test), multi_class='ovo')#create evaluation dataframe
stats_e25 = pd.DataFrame({'Test accuracy' :  round(test_loss_e25[1]*100,3),
                      'F1 score'      : round(f1_e25,3),
                      'ROC AUC score' : round(roc_e25,3),
                      'Total Loss'    : round(test_loss_e25[0],3)}, index=[0])#print evaluation dataframe
display(stats_e25)

Figure 5: Testing accuracy, f1-score, ROC-AUC, loss results

We got 88% accuracy for the testing dataset (not bad). But as we can see the accuracy gap between training and testing accuracy, this is the situation of overfitting. Overfitting causes the model to learn specific patterns/correlations in the training data and hence, not generalizing well for the unseen data (testing dataset). We can avoid overfitting using various ways:

Cross-Validation
Image Augmentation
Regularization (Dropout Layer)
Increasing the number of epochs
Adding more Dense layer
Adding CNN layers

I promise I will discuss how to handle overfitting using the above methods in my next blog. Follow me for my upcoming blogs soon. Raise your hands by clicking on the clap button if you like the blog.

Please find the complete code here.

Thank you!

References: