Mastering MNIST with ANN: Secret to hand-written digit recognization

The MNIST dataset serves as a widely recognized benchmark in the realm of machine learning and computer vision due to its manageable size and accessibility, yet it retains its utility by offering a challenging dataset for the training and assessment of machine learning models.

Comprising a total of 70,000 images, the MNIST dataset encompasses 60,000 images for training and an additional 10,000 images for testing. These images depict handwritten digits ranging from 0 to 9, and each of them is represented as a grayscale image with dimensions measuring 28x28 pixels.

Artificial Neural Networks (ANNs) represent a category of machine learning algorithms designed to discern intricate patterns within data. Drawing inspiration from the human brain, ANNs are constructed from interconnected nodes referred to as neurons.

Within an ANN, each neuron carries out straightforward mathematical operations on the incoming signals it receives. The output generated by each neuron is then transmitted to the subsequent layer of neurons in the network.

The training of ANNs involves exposing them to substantial volumes of data while fine-tuning the weights of the connections between neurons. This iterative process continues until the ANN becomes proficient at making precise predictions for new, unseen data.

Here below we start implementing MNIST dataset with ANN:

#Import required libraries

import tensorflow
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import Sequential
from keras.src.engine import sequential
from tensorflow.keras.layers import Dense, Flatten
from sklearn.metrics import accuracy_score

Loading the MNIST dataset:

(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

#Output

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step

Checking the shape of training dataset:

#Shape of train data

X_train.shape

#Output
(60000, 28, 28)

Above output can be understood as there are 60K images , each of images are in 28x28 form

#Shape of test data

X_test.shape

#Output

(10000, 28, 28)

Here output can be explained as there are 10K images , each of images are in 28x28 form

#Printing labels

y_train

#Output

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

Output depicts the numbers from 0 -9 as seen in the MNIST dataset

# Plotting the train data

plt.imshow(X_train[0])

#Output

<matplotlib.image.AxesImage at 0x7e20f39e6650>
Source: NehaDLML
#Transforming values between 0-1 for faster convergence

X_train = X_train/255

X_test = X_test/255

This is the methodology employed to finalize the pre-processing of the data.

Now starts the implementation of ANN (Artificial Neural Network). To know more about ANN, visit my previous blog of Neural Network.


#ANN Implementation

model = Sequential()
model.add(Flatten(input_shape=(28,28)))
model.add(Dense(128,activation = 'relu'))
model.add(Dense(10,activation = 'softmax'))

First we consider a sequential model for ANN implementation. From tensorflow.keras import sequential. Then we Flatten the image : This layer converts the input image 28x28 pixel into a 784 dimensional vector. Each neuron in the layer takes the output of the previous layer as input and produces an output. The dense layer with ReLU (Rectified Linear Unit) activation is a fully connection (FC) layer. The ReLU activation function is a non-linear function that helps the model to learn complex patterns from the data. For more, do read my blog on activation functions. Dense layer with the softmax activation function is the output layer of the model. This function generates a probability distribution across ten digits, resulting in an output layer that consists of a vector containing ten numbers (ranging from 0 to 9), with each number indicating the likelihood that the input image corresponds to a specific digit.


#Summarizing the model

model.summary()

#Output

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 784) 0

dense (Dense) (None, 128) 100480

dense_1 (Dense) (None, 10) 1290

=================================================================
Total params: 101770 (397.54 KB)
Trainable params: 101770 (397.54 KB)
Non-trainable params: 0 (0.00 Byte)

The output shows the number of parameters increases in Dense layer because we add weights and biases. We possess images with dimensions of 28x28 pixels, totaling 784 pixels. In our Dense layers, there are 128 nodes, resulting in approximately 100,352 parameters. It’s important to note that biases are directly associated with the number of nodes, adding an additional 128 parameters. Therefore, the total parameter count amounts to 100,480. In the Dense_1 layer, 128x10+10(for biases) = 1290.

#Optimizing the loss

model.compile(loss = 'sparse_categorical_crossentropy',optimizer = 'Adam', metrics = 'accuracy')

Above compilation shows keras sequential model we defined earlier. It requires three parameters: the loss function, typically categorical cross-entropy for tasks involving integer labels, is employed to assess the model’s image classification accuracy. This function serves the purpose of adjusting the model’s weights to enhance its performance. For more check our latest blog on the working of the loss function in depth. The subsequent parameter is the Optimizer, responsible for weight updates in order to minimize the loss function. There exists a variety of optimizers, each with its unique strengths and weaknesses. A well-regarded choice is ADAM, known for its simplicity and high effectiveness in training a diverse range of machine learning models. More about optimizers in my blog. Metrics, the third argument, is to evaluate the performance of the model.

#Fit functions trains the model on training images and labels

history = model.fit(X_train, y_train, epochs = 10, validation_split = 0.2)
#Predicting the probability of the model

probablity = model.predict(X_test)
predicted = probablity.argmax(axis = 1)
#Checks the accuracy of the predicted model

accuracy_score(y_test,predicted)

#Output

0.981

The model is 98% accurate to predict the right labels from the MNIST dataset.

# Plots the loss performance from the model

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
Source: NehaDLML
#Plots the accuracy and validation accuracy of the model

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
# Test the model

plt.imshow(X_test[2])
#Evaluate the model on test data

model.predict(X_test[2].reshape(1,28,28)).argmax(axis=1)

#Output
1/1 [==============================] - 0s 81ms/step

In summary, this represents a straightforward approach for evaluating the MNIST dataset, where the test accuracy of the model serves as a metric for assessing its performance.

In this blog post, we’ve provided a step-by-step guide on implementing the MNIST dataset using an Artificial Neural Network (ANN). We covered loading the dataset, data preprocessing, defining the ANN architecture, compiling the ANN, training it, evaluating its performance, and making predictions. ANNs are versatile and can be applied to a wide range of problem domains, including image classification, object detection, and natural language processing. In essence, ANNs serve as a valuable tool for solving various machine learning and computer vision challenges.

If you enjoy reading stories like these and want to support my writing, please consider Follow and Like . I’ll cover most deep learning topics in this series.These posts are also shared on X. For code reference , check my github profile : @NehaDLML. Feel free to connect with me on LinkedIn.

Thank you!

--

--

Neha Purohit
π€πˆ 𝐦𝐨𝐧𝐀𝐬.𝐒𝐨

Unleashing potentials πŸš€| Illuminating insightsπŸ“ˆ | Pioneering Innovations through the power of AIπŸ’ƒπŸ»