Build your first neural network for image classification in Keras

Umme Munira
The Startup
Published in
6 min readFeb 24, 2020
Photo by Franck V. on Unsplash

When it comes to talk in AI language, deep learning and neural network are used almost interchangeably. For this article, I am assuming all my readers have basic knowledge on neural network and how it works. One of the popular datasets for deep learning is MNIST dataset; images of handwritten digits to identify the digit from handwritten digits’ images. This can be the first neural network where novice deep learners can start.

In this article, I have used tf.keras, a high-level API to build and train models in TensorFlow. Without any further ado, let’s get started.

Import Libraries

Let’s start with importing necessary libraries. Here, I have used Keras; the framework for defining a neural network as a set of sequential layers in TensorFlow. So first, I am importing TensorFlow and calling it as tf, then importing keras.

As helper libraries, I am importing NumPy to represent our data as lists and matplotlib for data visualization.

try:
%tensorflow_version 2.x
except Exception:
pass
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

Import and Split the Dataset

Since this MNIST dataset is already available in tf.keras datasets API, we can directly load it from there.

mnist = keras.datasets.mnist

Next step is to split the data into training and test sets. load_data will give us training and test images and their related labels.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

This will give us four different NumPy arrays;

  • x_train and y_train: dataset to train the model
  • x_test and y_test: dataset to test the model with these images

Here, x_train and x_test are grayscale images (RGB codes from 0 to 255) stored as 28x28 NumPy array. y_train and y_test are labels for the number between 0 to 9.

Explore the Data

To explore, let’s first see the format of the dataset. It shows that there are 60,000 images in the training set and each image is represented as 28 x 28 pixels. len(y_train) shows that there are 60,000 labels in the training set.

x_train.shape
len(y_train)

Let’s try the same with test set. There are 10,000 images in the test set. Like training data, each image is represented as 28 x 28 pixels and it contains 10,000 image labels.

x_test.shape
len(y_test)

Let’s print a training image, and a training label to see how it looks. Since it’s 2020, I want to see what the number is at index=2020.

plt.imshow(x_train[2020])
print(y_train[2020])
print(x_train[2020])

The output looks like this.

Well it’s 6. However, we can see that the pixel values are in the range of 0 to 255. It is easier for us to normalize the data between 0 to 1 and we can do that just by dividing our train and test set by 255.

x_train = x_train / 255.0
x_test = x_test / 255.0

Build the Model

Next step is to design the model. Neural network is mainly built by the layers by chaining them together. I have designed a simple neural network below with three layers. Sequential defines a sequence of layers in the neural network. First layer; tf.keras.layers.Flatten converts our two-dimensional arrays (28 by 28) into a 1 dimensional array (28 * 28 = 784). Next, we have two sequential layers; tf.keras.layers.Dense. First one has 128 neurons and the last one has 10 neuron which will return 10 probability scores as an array since we have 10 different labels (0,1,2,3,4,5,6,7,8 and 9) to identify the digits.

I have also specified the activation functions here. Activation function instructs the neurons what to do. There are other options too. Just for now, I have used relu and softmax. You can play with the others and can see the differences in result.

Relu effectively means “If X>0 return X, else return 0”; it only passes values 0 or greater to the next layer in the network.

Softmax is often used as the activation for the last layer of a classification network because the result could be interpreted as a probability distribution.It takes a set of values, and effectively picks the biggest one. For example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it turns that into [0,0,0,0,1,0,0,0,0].

model=
tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)])

Compile the Model

We have built our model. Now we have to compile it. Here, we have to specify a loss function and an optimizer.

So, what does a loss function do? When neural network tries to learn, it makes a guess, i.e., it takes an input and guesses an output. Here comes the loss function to find out how correctly or incorrectly our network identifies the labels against our given correct labels. Then comes the optimizer; it tries to minimize the loss and makes another guess. Again, our loss function finds out how good or bad is the model output with respect to our given label. Optimizer again optimizes to minimize the loss, gives another output. This chain usually keeps going until it reaches a certain number of iterations (number of epochs) or level of accuracy (set by us using callback). We will set this certain number of iterations with the model.fit code. Otherwise, if we want a certain level of accuracy (e.g. 85% or 95%), we can specify that by callback option.

Now , here we will use sparse_categorical_crossentropy (computes the crossentropy loss between the labels and predictions) as loss function and ‘adam’ as optimizer. We will also specify the metrics as ‘accuracy’ to monitor the training and testing steps.

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

Train the Model

Now we need to use model.fit function to train our model. This function will go through the loop; making a guess, measuring how good or bad it is, i.e., loss, using the optimizer to make another guess etc. It will do it for our specified number of epochs. All the epochs will be shown here with their related loss and accuracy. So here, we are going to feed the training images with their labels to train the model. Our model will be trained using guesses and then it will make predictions with our test data.

model.fit(x_train, y_train, epochs=10)
Output of model.fit function showing accuracy around 99.5%.
Output of model.fit function showing accuracy around 99.5%.

After making all the guesses, we see the accuracy value at the end of last epoch. It looks like something 0.995 (or 99.5%) which means our neural network is about 99.5% accurate in classifying training images to its correct labels. This is great considering only 10 epochs and it was quick.

Evaluate accuracy

Now, we need to know how our model will work with unseen data. Here comes the test images. So, to compare the model performance on the test dataset, we can use model.evaluate :

model.evaluate(x_test,  y_test)
Output of model.evaluate function showing accuracy of 97.6%

It is little less than the training set; 97.6%. Still good.

Make Predictions

Now, we have trained our model. Let’s see our first prediction:

predictions = model.predict(x_test)
predictions[0]
Prediction result for 0th image corresponds to 10 different y labels

This prediction is an array of 10 numbers. They represent the model’s “confidence” that the image corresponds to each of the 10 different digits. From the above array, we can tell that the eighth label has the highest prediction value (0.99) and that should be our digit (which is ‘seven’). Let’s see which label has the highest confidence value:

np.argmax(predictions[0])

And the output shows it’s seven.

So, the model is most confident that this image is number 7. Let’s see what this image actually is by using our test label:

Great!! Our neural net has successfully recognized our digit.

Thanks for reading. Hope you like this article.

References

--

--

Umme Munira
The Startup

Deep learning enthusiast, engineer, traveller, foodie