Artificial Intelligence 101: Introduction to Keras (Pt 2)

Published in

The School of AI (official)

6 min readJul 4, 2019

Welcome to part two of this series. Be sure to check out part one if you haven’t done so already. Today we will be developing a simple classifier for the MNIST data set, which contains images of handwritten digits (zero through nine). At the end of training, we should be able to feed in our own handwriting and get a prediction for which number we wrote.

We’ll be training a feedforward neural network with densely connected layers, similar to our binary classifier. In this case, however, ten total classes will be used to represent each of our digits to the network. We’ll make use of a softmax activation for our final layer, as well as one-hot-encoding for our inputs. Don’t worry if you aren’t familiar with these terms. We’ll be reviewing them shortly.

Alright, let’s begin. Be sure to check out the accompanying Colab Notebook.

First, we’ll need to import python libraries as usual.

from keras.models import Sequential
from keras.layers import Dense
from keras.datasets import mnist
from keras.utils import to_categoricalimport matplotlib.pyplot as plt
import numpy as np

Next we’ll load the data, which is conveniently pre-shuffled and separated into training and test sets.

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Let’s take a look at the shape of each of these elements.

It should be clear from the numbers below that we have a 60,000 images in our training set and 10,000 images in our test set each of size 28x28 pixels.

print((X_train.shape, y_train.shape), (X_test.shape, y_test.shape))

Let’s take a peek at a few samples from our data.

for i in range(0, 6): # i takes on values 1 through 6
  plt.subplot(2, 3, i+1) # 2 by 3 subplot
  plt.grid(False)
  plt.axis('off')
  plt.imshow(X_train[i], cmap='gray')
plt.show()

As you can see, the images match their labels (below).

print(‘sample training labels (before one-hot-encoding): \n’)
print([y_train[i] for i in range(0, 6)])

We need to do some preprocessing to get our features in the shape our network is expecting.

We’ll reshape the training data to 60,000x784 and the test data to 10,000x784.

We divide by 255 because image data values range from 0–255, and our network would prefer to work with values between zero and one.

Note we flattened the image data. This is not the best approach, but we’ll use it for now just to demonstrate the power of a basic model. In a future article, we’ll look at convolutional layers — which preserve the spatial data for images.

X_train = X_train.reshape((60000, 28 * 28)).astype(‘float32’) / 255
X_test = X_test.reshape((10000, 28 * 28)).astype(‘float32’) / 255

Our labels need to be put in a one-hot-encoding format. Keras includes a convenient to_categorical() function for just this purpose. The idea is, we replace a set of numeric labels with a set of lists with a value of 1 for the index value corresponding to the original label-number as shown here:

before: {apples: 1, grapes: 2, oranges: 3}

after: {apples: [1, 0, 0], grapes: [0, 1, 0], oranges: [0, 0, 1]}

Why do we do this?

Whenever we are working with more than two classes, we need to include a softmax function at the end our network. This function will convert our variables to a set of probabilities (with a length equal to the number of classes) that add up to one. During training, imagine that we feed a network a picture of some grapes, and it returns the probabilities [0.1, 0.8, 0.1]. The network would compare the estimation [0.1, 0.8, 0.1] to the ideal [0, 1, 0] and gets a small amount of error (which is then used to update the network).

In other words, we put the labels in this special format so that we can easily compare them to the output of the network.

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
print(‘sample training labels (after one-hot-encoding): ‘)
for i in range(0, 6):
  print(y_train[i])

Now it’s time to define our model architecture. We’ll use a series of densely connected layers. As you can see, our overall network is very similar to the binary classification task, with the major difference being we use a softmax (rather than a sigmoid) activation function.

At the bottom we print out a summary of our model.

HIDDEN_UNITS = [64, 128, 256]
OUTPUT_SIZE = 10model = Sequential()
model.add(Dense(HIDDEN_UNITS[0], activation=’relu’, input_shape=(28*28,)))
model.add(Dense(HIDDEN_UNITS[1], activation=’relu’))
model.add(Dense(HIDDEN_UNITS[2], activation=’relu’))
model.add(Dense(OUTPUT_SIZE, activation=’softmax’))model.summary()

We’ll compile the model as before, setting the loss to categorical cross entropy. The optimizer is set to rmsprop, but I’d encourage the reader to try out other options.

model.compile(optimizer=’rmsprop’, 
              loss=’categorical_crossentropy’,
              metrics=[‘accuracy’])model.fit(X_train, y_train, epochs=5, batch_size=64)print(‘the model has finished training’)

Let’s check how our model does on the held out data.

loss, accuracy = model.evaluate(X_test, y_test)
print(“test loss: “, loss)
print(“test acc: “, accuracy)

We’ll save the predictions into a variable for now. Each row is a set of 10 numbers between 0 and 1 which represents the probability our network has assigned to each class for a given sample.

predictions = model.predict(X_test)
print(predictions)

As you can see, test data is in shape 10000x784 (10,000 vectors each of size 784 dimensions).

X_test.shape

To visually inspect the test data and compare our predicted labels, we’ll need to convert the images back to their original dimensions of 28x28.

X_test = X_test.reshape(10000,28,28)
print(X_test.shape)
plt.imshow(X_test[0], cmap=plt.get_cmap(‘gray’))
plt.show()

The function np.argmax() returns the index corresponding to the maximum value of an array.

This is useful, since we want the index of the value which has been assigned the highest probability.

print(np.argmax(predictions[0]))

Let’s also take a look at a bigger sampling to see if we can spot any of the mistaken predictions.

predicted_labels = [np.argmax(predictions[i]) for i in range(0,100)]
for i in range(0, 100):
  plt.subplots_adjust(bottom=-1) #give extra room between plots
  plt.subplot(10, 10, i+1) # 10 rows by 10 columns (of subplots)
  plt.title(str(predicted_labels[i])) #include the label under each
  plt.grid(False)
  plt.axis(‘off’)
  plt.imshow(X_test[i], cmap=’gray’)

Finally, (as an exercise) see if you can load in your own hand drawn digits and predict the values.

Artificial Intelligence 101: Introduction to Keras (Pt 2)

Written by Carson Bentley