End to End Image Classification project using TensorFlow

Published in

Analytics Vidhya

6 min readSep 20, 2019

The objective of this project is to develop a machine learning model capable of correctly classifying images of Dogs and Cats.

We are going to work on a subset of the original Dogs vs. Cats Dataset (3000 images sampled from the original dataset of 25000 images) to demonstrate techniques such as Image Augmentation and Droputs, and how these techniques contribute in improving performance. The smaller dataset spares us a lot of computing power (and training time !). I plan on demonstrating my work on the original dataset which demands GPU/High Computing power soon. This post is based on my experience with the TensorFlow in Practice Specialization.

I have also worked on building a model using Transfer Learning (Think hundreds of hrs of training on a Million + images used for our custom application) which builds on this project. You can find it here.

The Original Dataset with 25,000 training images of dogs and cats can be found here

Get the data (Smaller Dataset)

!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip \
    -O /tmp/cats_and_dogs_filtered.zip

Import libraries

The os library gives us access to the file system, and the zipfile library allows us to unzip the data.

Create the respective directories

The contents of the .zip are extracted to the base directory /tmp/cats_and_dogs_filtered.zip

Examples

Model with 4 convolutional layers with 32, 64, 128 and 128 convolutions respectively

Convolutions narrow down the content of the image to focus on specific details. It takes longer to run the model, but has a huge impact on accuracy.

We add convolutional layers, and flatten the final result to feed into the densely connected layers. As this is a 2 class classification problem, we use the Sigmoid function.

Model Summary

Model: "sequential" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= conv2d (Conv2D)              (None, 148, 148, 32)      896        _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 74, 74, 32)        0          _________________________________________________________________ conv2d_1 (Conv2D)            (None, 72, 72, 64)        18496      _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 36, 36, 64)        0          _________________________________________________________________ conv2d_2 (Conv2D)            (None, 34, 34, 128)       73856      _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 17, 17, 128)       0          _________________________________________________________________ conv2d_3 (Conv2D)            (None, 15, 15, 128)       147584     _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 7, 7, 128)         0          _________________________________________________________________ flatten (Flatten)            (None, 6272)              0          _________________________________________________________________ dense (Dense)                (None, 512)               3211776    _________________________________________________________________ dense_1 (Dense)              (None, 1)                 513        ================================================================= Total params: 3,453,121 Trainable params: 3,453,121 Non-trainable params: 0 _________________________________________________________________

The “output shape” column shows how the size of your feature map evolves in each successive layer. The convolution layers reduce the size of the feature maps by a bit due to padding, and each pooling layer halves the dimensions.

We use binary_crossentropy as the loss metric as we have 2 target classes (its a binary classification problem)

Our optimizer is RMSprop with a learning rate of 0.0001 (We can experiment with this; Adam and Adagrad optimizers would also work well)

Rescale the Images

The data generators read pictures in our source folders, convert them to float32 tensors, and feed them (with their labels) to our network. We’ll have one generator for the training images and one for the validation images.

Input data to the neural networks should be normalized to aid in processing. We will preprocess our images by normalizing the pixel values to be in the [0, 1] range (originally all values are in the [0, 255] range).

In Keras this can be done via the keras.preprocessing.image.ImageDataGenerator class using the rescale parameter.

This ImageDataGenerator class allows you to instantiate generators of augmented image batches (and their labels) via .flow(data, labels) or .flow_from_directory(directory). These generators can then be used with the Keras model methods that accept data generators as inputs: fit_generator, evaluate_generator, and predict_generator.

Flow images in batches of 20 using train _datagen and test_datagen

Training

Will train for 100 epochs and then plot loss and accuracy

Accuracy and Loss

The graph show that we have a training accuracy of almost 100%, while the validation accuracy is much lower in the 70%-75% range. It’s a classic case of Overfitting.

Our model does exceptionally well in case of images it has already seen and not so much with unseen images.

We can apply a simple yet very effective technique here called Image Augmentation where we change the images a bit by rotating them, squashing them, etc.

We can implement this using the ImageDataGenerator Class

train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')

Some options that are available:

rotation_range is a value in degrees (0–180), a range within which to randomly rotate pictures.
width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally
shear_range is for randomly applying shearing transformations
zoom_range is for randomly zooming inside pictures
horizontal_flip is for randomly flipping half of the images horizontally. This is relevant when there are no assumptions of horizontal assymmetry (e.g. real-world pictures)
fill_mode is the strategy used for filling in newly created pixels, which can appear after a rotation or a width/height shift

Accuracy and Loss after implementing Image Augmentation

As we can see, the accuracy and loss metrics have improved (although the training time increased), but we can do better. Let us try implementing Dropout which would help us with overfitting.

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Dropout(0.5), #Implementing Dropout
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

After implementing Dropout, both the training accuracy and validation accuracy are both about 82% with a better loss curve.

This is a decent result considering that we trained on only 2000 training images versus the original dataset of 22,500 training images. In the case of Deep Learning, performance almost always increases when we train on more data. We can do significantly better by implementing Transfer Learning, where we use a pre-trained model as demonstrated in this post which is in continuation of this project.

I expect that I will experience a better result when working with the full dataset next.

We can also test our model with our own images with this piece of code which lets us choose one or more images from the file system. Then the images run through our model and give out results.