A more complex Neural Network
In the previous article, I showed how to create the simplest Neural Network Possible, with just 1 input node, 1 hidden node and 1 output node, in this case the Neural Network will be slightly more complex and will be used for image classification.
Image recognition is the ability to detect objects in the image and recognize or classify them into one of several classes.
Since we are born, we are trained with thousands of images every day without even realizing it, for this reason, for us it’s easy to distinguish a dog from a cat or any other animal. However, it is not so easy for a computer to imitate this process.
Computer views an image just as an array of numbers that represent how dark each pixel is and they try to look for patterns to recognize and distinguish key features in the image.
Implementation of a Neural network model for image classification
Here we are going to train a neural network to recognize handwritten digits, from a common dataset called MNIST. In the TensorFlow library there are some datasets diretly available in the tf.keras dataset API like the MNIST handwritten digits dataset, often used as the hello world of machine learning programs.
The handwritten digit data set contains 70.000 (60.000 for training, 10.000 for testing) gray scale images classified in 10 different categories, thus having each image their own label, representing images of numbers from 0 to 9 at low resolution (28x28 pixels).
First we need to import the TensorFlow library:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
Now we can load the dataset, and, as said before, it is divided in test and train images each with its own label, so we need to save these data in 4 variables:
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
Visualizing our data and understanding what it contains is always an important step for every machine learning application.
First we can print the dimension of the array by using the shape attribute.
train_images.shape(60000, 28, 28)
So we confirm the images are 60k with a dimension of 28x28 pixels. Let’s try now to print the first value of the training images:
import matplotlib.pyplot as pltplt.imshow(training_images)print(training_labels)print(training_images)
As we can see, the values in the number, that express the pixel intensity, are between 0 and 255, but when working with neural network we need to normalize the data and have the values to range between 0 and 1.
training_images = training_images / 255.0test_images = test_images / 255.0
Let’s now design the model.
Compared to the model built in the previous article is going to be just slightly more complex, but many things are
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),tf.keras.layers.Dense(128, activation=tf.nn.relu),tf.keras.layers.Dense(10, activation=tf.nn.softmax)])
Here we first define the model as sequential, meaning that it will be a neural network with a sequence of layers. The first layer is called Flatten, it allow us to “flatten” a matrix to 1 dimensional array.
Thanks to the flatten layer, we can now transform every image…. this array as input for our dense layer, that represents a layer of neurons in the network. Every dense layer needs an activation function, to tell them what to do.
Now you may ask me “ why didn’t you talk about the activation function in the previous article?”. Well, actually when we use the Dense layer, if the activation function is not specifiyed its default value is “linear” activation:
a(x) = x.
In this case, because the first layer of the model needs to learn what a digit is during training, we use the ReLU as activation function. Mathematically the ReLU function, returns x for all values of x > 0, and returns 0 for all values of x ≤ 0.
The output, that will be the estimation of the class to which the digit belongs to, will be the input to the second dense layer that will use the softmax activation function which will give as output the class with the highest probability.
After defining the model, we need to compile the model with model.compile(), this function is used to configure the loss, optimizer and metrics of the model.
model.compile(optimizer = tf.keras.optimizers.Adam(),loss = ‘sparse_categorical_crossentropy’,metrics=[‘accuracy’])
With the command model.summary we can review a the summary of the model: the layers, shape of the inputs/outputs and the number of parameters it handles:
As we can see from the output of the model summary, this simple model has more than 101k trainable parameters!
Now it’s time to train the model. During this phase the model will try to fit the training data to the training labels or, in other word, figure out the relationship between the training data and its labels, so that when giving new images to the model it can make a prediction to which class that image belongs to.
model.fit(training_images, training_labels, epochs=5)
As we can see this simple neural network gets an accuracy of 98.6% in classifying the training data! This high accuracy has been reached in just 5 epochs, so we didn’t overfit the model, and the training was pretty fast despite working on 60.000 images.
So now let’s see how it handles unseen data using the call model.evaluate, passing the test set and its labels as parameters, it will report back the accuracy on this “new” data.
As expected, the accuracy on unseen data is a little bit worse but 97.6% is still a good result.
Reflections on the model
So, what we did was feeding raw pixels into the neural network that worked to build image recognition. With an accuracy of 97.6% the digit recognizer does work really well on simple images where the letter is right in the middle of the image, but the recognizer fails to work when the number isn’t perfectly centered in the image. Just the slightest position change ruins everything.
In other words, the model learnt to recognize a centered number and not the features that make up what a number is.
That’s where Convolutions are very powerful. A convolution is a filter that passes over an image, processing it, and extracting features that show a commonality in the image.
In the next article I’ll exaplain what are the challenges that we face with image recognition and why fully-connected Neural Networks are not the best model for this task.
Here you can see the full code or try it through Google Colab: