Is Your Dog an Elephant? Building an Animal Classifier with TensorFlow

Overview

Have you ever wanted to try machine learning? If so, this is the tutorial for you. You’ll learn to build an image classifier and train it using Cloud GPUs. What is an image classifier? It is a machine learning model which can classify an image. You’ll show it an image, and it’ll tell you what it thinks it is. Although our model won’t be as accurate as state-of-the-art models, it is a good starting point.

We will use Cloud GPUs in this tutorial so all you need is a web browser. Our friends at Tensorpad have graciously provided 5 free hours of GPU compute time with the promo code “ClassifierGuide250”. That’s more than enough to complete this tutorial.

Objective

Our objective is to build an image classifier then use it to make predictions. This process includes the following phases:

  1. Building an image dataset (we’ve done this for you)
  2. Loading the image dataset in a way that is for efficient training
  3. Building the image classifier model
  4. Training the model using Cloud GPUs
  5. Validating the trained model

It is important to remember that the process above is not linear. Generally, each phase will need to be revisited multiple times to optimize the model’s accuracy.

Additionally, we will also make predictions using the trained model, and export it for use elsewhere.

Getting started

Follow these steps to try this code:

  1. Create a Tensorpad account. Use promo code ClassifierGuide250 for 5 free hours of GPU compute time.
  2. Create JupyterLab Job with TensorFlow 1.11. Wait for the instance to be created then Open Lab.
  3. Open a Terminal in the opened lab.
  4. Clone this project’s Github repository with: $ git clone https://github.com/PeterChauYEG/animal_classifier.git
  5. Open Animal Classifier.pynb inside the animal_classifier directory. This will open the Jupyter notebook.
  6. Run the whole notebook by using the Run menu and selecting Run All Cells
  7. Scroll down to see training in action. The first part of training occurs at cell 20.

Loading Dependencies

We need to import a number of Python dependencies.

# Allows division to return a float
from __future__ import division
# Allows access to the file system
import os
# Provides an API for scientific computing
import numpy as np
# Allows use to timestamp the training run
from datetime import datetime
# Allows us to render images and plot data
from keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img
import math
import matplotlib.pyplot as plt
# Machine learning framework that provides an abstract API on top of Tensorflow
import keras
from keras.callbacks import TensorBoard
from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
from keras import optimizers

Configurations

Because we are plotting in a Jupyter Lab, we need to configure matplotlib to render plots inline.

# configure the matplotlib for Jupyter Lab used for rendering the images
%matplotlib inline

Dataset Directories

The dataset holds all the images. A separate directory should be created for each of the following:

  1. Training Dataset: Images which are used during model training
  2. Validation Dataset: Images which are used during model validation

Each image dataset should be organized as directories of images. They should be named by the class (eg. cat) of images it holds. It is important that none of the images in the training dataset is in the validation dataset.

# Paths to datasets to be used
train_dir = 'dataset/train'
validate_dir = 'dataset/validate'

Hyperparameters

Hyperparameters are used to tune the model and model training. They greatly influence the resulting metrics. Let’s examine them:

  1. Images will be resized to 200x200x3. This means each image will be 200 pixels by 200 pixels with 3 color channels (red, green, blue). More pixels tends to help the model as it can increase details of the image.
  2. The learning rate is the rate which the model will update the gradients which it is trying to optimize.
  3. The batch size is the number of images that will be feed into the model in one iteration.
  4. The epoch is the number of times the model should iterate over the entire dataset and update the weights of the model. At some number of epochs, the gains of training approach 0. It is possible to overtrain a model.
  5. It is often recommended to split the dataset in an 80:20 ratio. This is a general rule that works reasonably well.
# number of images in the training dataset
n_train = 8000
# number of images in the validation dataset
n_validation = 2000
# the number of pixels for the width and height of the image
image_dim = 200
# the size of the image (h,w,c)
input_shape = (image_dim, image_dim, 3)
# the rate which the model learns
learning_rate = 0.001
# size of each mini-batch
batch_size = 32
# nunmber of training episodes
epochs = 10

Outputs

We will output 2 items:

  1. Training logs: These can be feed into Tensorboard for analysis
  2. Trained model: So it can be used elsewhere

We want to save the training logs to a directory with a timestamp of when training started, and some data about the hyperparameters used. We also want to give the trained model a name when we save it.

# directory which we will save training outputs to
# add a timestamp so that tensorboard show each training session as a different run
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
output_logs_dir = 'logs/' + timestamp + '-' + str(batch_size) + '-' + str(epochs)
# directory to save the model
model_name = 'trained_model'

Loading the image dataset in a way that is for efficient training

Image Data Generators

A naive approach to data loading is to load all the images and transform them up front. This would result in a huge amount of used RAM before training starts. Your machine might not be able to handle this, which would result in crashing kernels. It can also take a very long time depending on the dataset.

Instead, we can load and transform images required exactly when we need it. This would be when feeding a batch of images to the model during training.

Keras provides an optimized method of doing this with the Image Data Generator class. It allows us to load images from a directory efficiently. These generators can also transform the dataset in many other ways to augment it. Explore these optional transformations to help make your model more general, and improve accuracy.

# define data generators
train_data_generator = ImageDataGenerator(rescale=1./255,
fill_mode='nearest')
validation_data_generator = ImageDataGenerator(rescale=1./255,
fill_mode='nearest')
# tell the data generators to use data from the train and validation directories
train_generator = train_data_generator.flow_from_directory(train_dir,
target_size=(image_dim, image_dim),
batch_size=batch_size,
class_mode='categorical')
validation_generator = validation_data_generator.flow_from_directory(validate_dir,
target_size=(image_dim, image_dim),
batch_size=batch_size,
class_mode='categorical')

Get Class Names

It is useful to have a dictionary of image classes. We can use this dictionary to make our predictions more human-readable.

# get a dictionary of class names
classes_dictionary = train_generator.class_indices
# turn classes dictionary into a list
class_keys = list(classes_dictionary.keys())
# get the number of classes
n_classes = len(class_keys)

Load Image Paths of the Validation Dataset

Load the paths for all of the images in the validation dataset. These will be used later when we make predictions.

# Get the name of each directory in the root directory and store them as an array.
classes = get_class_labels(validate_dir)
# Get the paths of all the images in the first class directory and store them as a 2d array.
image_paths = get_class_images(classes, validate_dir)

Building the image classifier model

Our model consists of many layers. Images are passed through the model and a set of numbers are outputted. This set of numbers describe the probability of class the image is. We take the largest of these numbers as the most likely class.

We will use several types of layers and activations:

  1. Conv2D is a 2-dimensional convolutional layer. It applies filters over the inputted image. This helps the model learn about spatial relationships in the image.
  2. ReLu is a type of non-linear activation function. It helps the model understand which neurons are activating.
  3. MaxPooling2D downsamples its input. We use It to reduce the dimensionality of input. This creates a more abstract form of the input.
  4. Flatten will turn a matrix into a row. Like flattening a muffin into a pancake. We use it so that we can feed the output into dense layers.
  5. Dense is a densely-connected neural network layer.
  6. Softmax is an activation function. We use it to turn the output numbers into a range of 0 and 1. It will also cause all the outputted numbers to add up to 1. This can be interpreted as the decimal probability of a class.

Note that the last layer has the same number of neurons as classes. This means that this layer will output 10 numbers, mapping to a class.

# define the model 
# takes in images, convoles them, flattens them, classifies them
model = Sequential([
Conv2D(16, (3, 3), activation='relu', padding='same', input_shape=input_shape),
Conv2D(16, (3, 3), activation='relu', padding='same'),
MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),
Conv2D(32, (3, 3), activation='relu', padding='same'),
Conv2D(32, (3, 3), activation='relu', padding='same'),
MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),
Conv2D(64, (3, 3), activation='relu', padding='same'),
Conv2D(64, (3, 3), activation='relu', padding='same'),
MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),
Conv2D(128, (3, 3), activation='relu', padding='same'),
Conv2D(128, (3, 3), activation='relu', padding='same'),
MaxPooling2D(pool_size=(2,2), strides=None, padding='valid'),
Flatten(),
Dense(256, activation='relu'),
Dense(n_classes, activation='softmax')
])
# define the optimizer and loss to use
model.compile(optimizer=optimizers.SGD(lr=learning_rate, momentum=0.9),
loss='categorical_crossentropy',
metrics=['accuracy'])

Examine the Model

We can generate a high-level overview of the model structure. Each row is a layer of the model.

# look at the defined model
model.summary()

Model Structure

_________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= conv2d_1 (Conv2D)            (None, 200, 200, 16)      448        _________________________________________________________________ conv2d_2 (Conv2D)            (None, 200, 200, 16)      2320       _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 100, 100, 16)      0          _________________________________________________________________ conv2d_3 (Conv2D)            (None, 100, 100, 32)      4640       _________________________________________________________________ conv2d_4 (Conv2D)            (None, 100, 100, 32)      9248       _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 50, 50, 32)        0          _________________________________________________________________ conv2d_5 (Conv2D)            (None, 50, 50, 64)        18496      _________________________________________________________________ conv2d_6 (Conv2D)            (None, 50, 50, 64)        36928      _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 25, 25, 64)        0          _________________________________________________________________ conv2d_7 (Conv2D)            (None, 25, 25, 128)       73856      _________________________________________________________________ conv2d_8 (Conv2D)            (None, 25, 25, 128)       147584     _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 12, 12, 128)       0          _________________________________________________________________ flatten_1 (Flatten)          (None, 18432)             0          _________________________________________________________________ dense_1 (Dense)              (None, 256)               4718848    _________________________________________________________________ dense_2 (Dense)              (None, 10)                2570       ================================================================= Total params: 5,014,938 Trainable params: 5,014,938 Non-trainable params: 0 _________________________________________________________________

Examine Model Accuracy Before Training

Let’s examine how well the model performs before we train it. We will determine the model’s accuracy on 1 class. This will be done by making predictions with all the images of 1 class. Remember that this isn’t representative of the whole model as it is only 1 class of 10.

# label of the class we are making predictions on
single_class = class_keys[0]
# first class image paths 
single_class_image_paths = image_paths[0]
# make predictions on the first class
single_class_predictions = predict(int(n_validation / n_classes), single_class_image_paths, model)
# get the accuracy of predictions on the first class
single_class_accuracy = predictions_accuracy(class_keys, single_class, single_class_predictions)
print("Current accuracy of model for class " + single_class + ": " + str(single_class_accuracy))

Training the model using Cloud GPUs

This model has over 5000000 trainable parameter — far too many to set manually. We need to train the model with the training dataset so that the model can to learn the optimal weights that should be used. These weights are the parameter values of the model.

# log information for use with tensorboard
tensorboard = TensorBoard(log_dir=output_logs_dir)
# train the model using the training data generator
model.fit_generator(train_generator,
steps_per_epoch=math.floor(n_train/batch_size),
validation_data=validation_generator,
validation_steps=n_validation,
epochs=epochs,
callbacks=[tensorboard])

Examine Model Accuracy After Some Training

Let’s examine how well the model performs now that we’ve trained it a bit. Again, we will determine the model’s accuracy on 1 class.

# make predictions on the first class
single_class_predictions = predict(int(n_train / n_classes), single_class_image_paths, model)
# get the accuracy of predictions on the first class
single_class_accuracy = predictions_accuracy(class_keys, single_class, single_class_predictions)
print("Current accuracy of model for class " + single_class + ": " + str(single_class_accuracy))

Continue Training the Model

Let’s continue training the model.

# train the model using the training data generator
model.fit_generator(train_generator,
steps_per_epoch=math.floor(n_train/batch_size),
validation_data=validation_generator,
validation_steps=n_validation,
epochs=epochs,
callbacks=[tensorboard])

Examine Model Accuracy After Training

Now that we’ve completed training the model, let’s examine its accuracy on 1 class.

# make predictions on the first class
single_class_predictions = predict(int(n_train / n_classes), single_class_image_paths, model)
# get the accuracy of predictions on the first class
single_class_accuracy = predictions_accuracy(class_keys, single_class, single_class_predictions)
print("Current accuracy of model for class " + single_class + ": " + str(single_class_accuracy))

Understanding training metrics

Our goal is to maximize validation accuracy while minimizing validation loss. The validation dataset is never used for training. This allows us to measure how well the model performs on images it’s never seen before.

The training and validation accuracies should be similar at the end of training. If these values aren’t, this could be a sign of overfitting.

You should see training loss (loss) decrease, training accuracy (acc) increase for the training data.

You should see validation loss (val_loss) decrease, validation accuracy (val_acc) increase for the validation data.

Tensorboard

Tensorboard is “a suite of visualization tools called TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it”. This is useful for understanding how models/hyperparameters compare.

On Tensorpad, you can use the commands tab to create a new Tensorboard.

  1. Open the commands panel with CTRL + SHIFT + C
  2. Search for Create a new tensorboard
  3. Select this option and point it to animal_classifier/logs.

You’ll be able to visualize the accuracy of your model over epochs. Each training run creates a new set of logs. This appears in Tensorboard as a separate plotted line.

The following are screenshots of my training results plotted on Tensorboard. Your results should look similar.

Training Loss

Training Accuracy

Validation Loss

Validation Accuracy

Predict

It is useful to know which image predictions were correct and which were wrong. Let’s examine 10 predictions, 1 prediction per class.

# get 1 image path per class
predict_image_paths = [image_path[0] for image_path in image_paths]
# Make 1 prediction per class
predictions = predict(10, predict_image_paths, model)
# plot the image that was predicted
plot_prediction(class_keys, predict_image_paths, predictions)

Exporting the Trained Model

What do we do with a trained model? Export for your applications!

We can use the train model instead of training it every time we want to use. There are many formats you can export it in. Here, we will export it so that it can be loaded up by this notebook

# export the model for later
model.save(model_name)

Next Steps

In my next blog post, we will explore hyperparameter sweeping to the model’s accuracy. We will also explore how we can improve the model’s architecture.

Learning Resources

References

Keras
Animals 10 dataset