Deep Learning: The classic MNIST Dataset using Neural Networks

Artin Sinani
3 min readAug 30, 2019

--

As I delved deeper into artificial intelligence, past the machine learning and into something called Deep Learning. I realized that I was entering into a world that would change our current world with the capabilities it held.

To give you an idea where Deep Learning fits in the world of Artificial Intelligence.

With deep learning, you can basically teach a machine what images are and once the machine sees new information it can differentiate between the classifications of each image based on what you’ve taught it. Now, a classic dataset that almost every data scientist has attempted is the MNIST dataset.

The MNIST dataset consists of handwritten digits from 0 to 9 and with less than 20 lines of code we will train a deep learning neural network model using Keras. If you’d like you can follow along in a Google Colab. You won’t have to worry about obtaining the CSV file for the dataset as the MNIST dataset is already built into the Keras library.

1). Loading the dataset

2). We need to build a Network Architecture and briefly explain the components of a Neural Network

The core building block of a neural network is the layer, a data-processing module that you can think of as a filter for data.

In the code below, our network consists of a sequence of two Dense layers, which are densely connected neural layers.

The second (to last) layer is a 10-way softmax layer as the value is 10 and activation is softmax. This will return an array of 10 probability scores all summing to 1. Each will be the probability that the current digit image belongs to one of our 10 digit classes.

3) The Compilation Step

We need three more things for this neural network in order to begin our training:

  • A loss function — which is how the network will measure performance on the training data and how it will be able to steer itself in the right direction.
  • Optimizer — A mechanism which the network will update itself based on the loss function and data it sees.
  • Metrics to monitor during training and testing — accuracy.

Before we fit our model, we need to prepare the data and categorically encode the labels.

5) Time to fit the model!

Our model scored a 98.87% on the training data.

That’s pretty impressive of a training data score: 98.9%

But this is useless unless we can figure out what the test data scored.

The test data reached a 98.2% accuracy.

As you can notice the slight difference between accuracy scores of the train vs test data, this indicates that there was a little bit of overfitting in our model.

My code can be found here:

https://github.com/ASinani/MNIST-Dataset/blob/master/A_Deep_Learning_Dive_with_the_classic_MNIST_Dataset.ipynb

Just like that we’ve successfully trained a deep learning model. It wasn’t as intimidating as it sounded was it?

I’d love to hear your reaction, thoughts or questions regarding deep learning in the comments section.

--

--