CNN Transfer Learning for Predicting Dog’s Breed

Published in

The Startup

5 min readAug 15, 2020

Transfer learning make use of the knowledge gained while solving one problem and applying it to a different but related problem. They utilize knowledge acquired for one task to solve related ones.

The project is divided into seven steps and I am going to walk you through each of these steps.

1. Importing the Datasets

Dog Data Set

The dog data set is divided and loaded into train, validation and test groups. There are a total of 8351 images each belonging to one of the 133 dog breeds. The division for the train, validation and test set are made in the ratio of 80:10:10 .

Human Data Set

There are a total of 13233 human images loaded into an array.

2. Detecting Humans

In this step we make use of the Haar feature-based cascade classifiers for detecting the human face in the images. We make use of a pre-trained model from OpenCV, haarcascades for the face detection.

A sample output of the face detection model used in this project

We then assess this face detector by passing in images of human and dogs. The model is able to detect human faces with a probability of 100% however when we pass an image of the dog the model detects the face only 11% of time.

3. Detecting dogs

Now we move on to making a detector for dogs. We make use of the pre-trained ResNet-50 model for this prediction. Using this model for prediction we see that the model perform fairly well in identifying dogs with a probability of 100% whereas it does not identify a human’s image as a dog.

So for a predicting a dog breed closest to the human’s face this model fails.

4. Creating a CNN for classifying the Dog’s Breed

Now we try to develop a CNN model for prediction.

It is really difficult to distinguish between breeds as they share similar characteristics, for this reason I decided to use Convolutional Layers in order to identify increasingly complex patterns with the hinted architecture as a base. The three convolutional layers should be able to capture most of the features

The first convolutional layer with 16 filters identifies lower level features such as edges or lines. The input_shape is (224, 224, 3).
The second convolutional layer with 32 filters identifies more complex features such as shapes.
The third convolutional layer with 64 filters identifies high level features.
The MaxPooling layer after each converlutional layer reduces the size of the representation by 50% for height and width.
The GlableAveragePooling layer changes the size of height and width to one, then feed it to the last dense layer.
The Dense layer with 133 nodes and the softmax function classifies the image into one of the 133 dog breeds.

Since the data set is relatively small the performance of the model is pretty mediocre and achieves a test accuracy of only about 4.3%.

5. Using a pre-trained VGG16 for Dog Classification using Transfer Learning

First we obtain the bottleneck features for the model. These bottleneck features are the ones which aid in the transfer learning. In this section we use a VGG16 model for the classification task.

VGG16 Architecture
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
global_average_pooling2d_2 ( (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 133)               68229     
=================================================================
Total params: 68,229
Trainable params: 68,229
Non-trainable params: 0
_________________________________________________________________

This model achieves a test accuracy of 46.29%. Though a decent improvement from the previous model trained from scratch the accuracy is still not very satisfying to used for a real world application.

6. Using a pretrained Xception model for classifying Dog’s Breed using Transfer Learning

In this section Xception model is used for the classification task. Transfer learning is used by importing the train, validation and test bottleneck features.

Xception Model Architecture
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
global_average_pooling2d_3 ( (None, 2048)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               524544    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 133)               34181     
=================================================================
Total params: 558,725
Trainable params: 558,725
Non-trainable params: 0
_________________________________________________________________

This model achieves a test accuracy of 84.92% which is impressive considering the difficulty in identifying dog’s breeds considering how difficult it is even for a user to distinguish some similar breeds.