Dog Breed Classifier

Published in

The Startup

12 min readSep 13, 2019

Different dog breeds — Dog Breeds — pic source: German Shepherd Rescue Trust New Zealand

Imagine you are having your weekend jog/walk in the park, you see a really cute dog. Have you ever wondered which breed the dog belonged to? I have…

There are 266 individual breeds of dog pictured on the website dogtime. If you are like me, you would be able to identify not more than 10–15 of the breeds.

So, when I was given a choice of a few different projects for the Data Scientist Nanodegree by Udacity, I chose the ‘Dog Breed Classifier Project’. This is a very popular project across machine learning and artificial intelligence nanodegree programs offered by Udacity.

Overview

The aim of the project in the Data Scientist nanodegree was to create a web application that is able to identify a breed of dog if given a photo or image as input. If the photo or image contains a human face (or alien face), then the application will return the breed of dog that most resembles this person.

The project uses Convolutional Neural Networks (CNNs)! A pipeline is built to process real-world, user-supplied images. Given an image of a dog, the algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed.

The steps that were followed to work through the project were the following:

Step 0: Import Datasets
Step 1: Detect Humans
Step 2: Detect Dogs
Step 3: Create a CNN to classify Dog Breeds (from scratch)
Step 4: Use a CNN to classify Dog Breeds (using Transfer Learning)
Step 5: Create a CNN to classify Dog Breeds (using Transfer Learning)
Step 6: Write an algorithm
Step 7: Test algorithm

In this project I have experimented with both Keras and Fast.AI to build the Convolutional Neural Network (CNN) to make the dog predictions.

I have set myself a target test accuracy for the CNN of 90% i.e., the model identifies the dog breed 9 times out of 10 correctly. We will be using the accuracy metric on the testing dataset to measure the performance of our models.

To follow along with the steps you can download or clone the notebook from my github repository. The repository features the ‘dog_breed_classifier.ipynb’ that runs on the GPU provided for free at Google Colab.

Step 0: Import Datasets

The datasets were provided by Udacity.

Dog Images — The dog images provided are available in the repository within the Images directory further organized into train, valid and test subfolders
Human Faces — An exhaustive dataset of faces of celebrities have also been added to the repository in the lfw folder
Haarcascades — ML-based approach where a cascade function is trained from a lot of positive and negative images, and used to detect objects in other images. The algorithm uses the Haar frontal face to detect humans. So the expectation is that an image with the frontal features clearly defined is required
Test Images — A folder with certain test images have been added to be able to check the effectiveness of the algorithm
Pre-computed features for networks currently available in Keras (i.e. VGG19, InceptionV3 and Xception) will be made available from S3
any other downloads to ensure smooth running of the notebook are available in the repository.

Load all the libraries and packages required through the notebook.

The libraries required can be categorized as follows:

Utility libraries — random (for random seeding), timeit (to calculate execution time),os, pathlib, glob(for folder and path operations), tqdm (for execution progress), sklearn (for loading datasets), requests and io (load files from the web)
Image processing — OpenCV (cv2), PIL
Keras and Fastai for creating CNN
Matplotlib for viewing plots/images and Numpy for tensor processing

Use the load dataset function from sklearn to import our datasets for our dog breed model training. Create the list of training, validation and test sets of filenames and the dog breed labels. Create a few paths that will be used later.

The dog_names variable stores a list of the names for the classes to use in our prediction model. Based on the path name, we see a total of 8351 images of dogs belonging to 133 different dog breeds which are then categorized into 6680, 835 and 836 images in training, validation and testing.

Step 1: Detect Humans based on OpenCV Haar cascade classifiers

Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images, which is then used to detect objects in other images.

We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on github. Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter. The face_detector function takes a string-valued file path to an image as input and returns True if a human face is detected in an image and False otherwise. While testing the human face detector, all 100 human faces were detected as human faces while 11 of the 100 dog faces were also detected as human faces

Step 2: Detect Dogs

Here, we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape

(nb_samples,rows,columns,channels)

where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

Create tensor input from paths to images

The path_to_tensor function takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape (1, 224, 224, 3).

The paths_to_tensor function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape (nbsamples, 224, 224, 3). Here, nb_samples is the number of samples, or number of images, in the supplied array of image paths. It is best to think of nb_samples as the number of 3D tensors (where each 3D tensor corresponds to a different image).

In addition, ResNet-50 requires additional processing such as reordering of channels from RGB to BGR and normalization of pixels which is done using preprocess_input.

The model is then used to extract the predictions. The predict method, returns an array whose 𝑖-th entry is the model's predicted probability that the image belongs to the 𝑖-th ImageNet category. This is implemented in the ResNet50_predict_labels function below.

The categories corresponding to dogs appear in an uninterrupted sequence corresponding to keys 151–268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. So, if the function returns any number between 151 to 268, the supplied image is that of a dog.

The dog_detector function above, returns True if a dog is detected in an image (and False if not). None of the the sample of human images have a detected dog as expected and all sample images of dogs have a detected dog as expected.

Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

The model that I selected had a CNN architecture of 4 convolutional layers alternating with max-pooling layers, 10% dropout and batch normalization. The filters used were 16, 32, 64 and 128. The drop-outs were used to reduce the possibility of over-fitting.

This is then followed by a global average pooling layer which is then followed by a dense layer to identify 133 breeds.

This takes a 4D-tensor with shape (1, 224, 224, 3) and provides an array of 133 with probabilities. The optimizer used was ‘RMSProp’ and metric used was accuracy. The model was run for 10 epochs and provided an accuracy of 6.69%

Step 4: Use a CNN to Classify Dog Breeds

I used VGG16 to demonstrate the use of Transfer Learning. Bottleneck features is the concept of taking a pre-trained model and chopping off the top classifying layer, and then providing this “chopped” VGG16 as the first layer into our model.

The bottleneck features are the last activation maps in the VGG16, (the fully-connected layers for classifying has been cut off) thus making it now an effective feature extractor. The bottleneck features were obtained from a website where its stored as a .npz file using the BytesIO library along with requests for the url extraction.

The pre-trained VGG-16 model was then used as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. The shape of the VGG16 pretrained model was 6680, 7, 7, 512 i.e. a layer of (7,7,512) with 6680 samples. A global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax. Running this model for 20 epochs resulted in an increase in the accuracy to 47%. This demonstrates the benefit of leveraging Transfer Learning from pre-trained models.

Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

The model was built using Keras leveraging Transfer Learning. I tried with 4 different models VGG19, ResNet50, InceptionV3 and Xception.

The shapes correspond to VGG19: (6680, 7, 7, 512) Resnet50 : (6680, 1, 1, 2048) Inception: (6680, 5, 5, 2048) Xception : (6680, 7, 7, 2048). It took about 160 seconds to load all the Transfer learning models.

These models were then added with a global average pooling layer, a dropout layer followed by a fully connected layer (with softmax) and then run for 20 epochs

Training the models took less than a minute in each of these cases.

Accuracy for Xception was ~85% while VGG19 was ~46%

I then explored options for increasing the accuracy. I used fastai to see if we could leverage transfer learning and obtain a higher accuracy.

The databunch was created and normalized.

A cnn_learner was created with the resnet34 model and was run for two cycles. The accuracy was upto 86%. An optimal learning rate seems to be between 1e-6 and 1e-4

After using unfreeze and refitting the model and for 10 epochs an accuracy of upto 89.8% is also obtained that ensures upto 9 out of 10 images are accurately classified.

Based on the analysis of various models that we have fit, the learn_resnet34 seems to provide the most accuracy. This is also saved and exported as a pickle file for classification.

Step 6: Write own algorithm to provide an output breed based on an image

We input an image path, the bottleneck features for our pretrained model are applied to the image, this is then processed through our trained fully-connected model which gives a predicted_breed, the category index and the probability tensor. The predict_breed function takes an input of a file_path and outputs the breed of the dog.

Our algorithm accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

if a dog is detected in the image, return the predicted breed.
if a human is detected in the image, return the resembling dog breed.
if neither is detected in the image, provide output that indicates an error.

The algorithm leverages the CNN built in Stage 5 and leverages the previous functions created to come up with an output.

The algo function determines if the provided file_path contains a dog or human or neither and returns the species (dog or human or neither) and the predicted breed of the image

The provide_output outputs a greeting based on the predicted species and dog breed.

Step 7: Test Your Algorithm

The six dogs that were sampled to check the algorithm were correctly identified as dogs. The breeds of 5 of 6 were accurate too. Only 1 dog (a Rajapalayam, a native breed was identified as a Great Dane, possibly because Rajapalayam is not one of the 133 breeds in the ImageNet dataset.

The humans were also identified as human and a dog breed predicted — incidentally both were predicted as Dogue_de_bordeaux

Reflection

At the start, my objective was to create a CNN with 90% testing accuracy. Our final model obtained 89.8% testing accuracy.

There are a few breeds that are virtually identical and are sub-breeds. There’s also a possibility of some images being either blurred or having too much noise. There’s also a possibility of enhancing the quality by additional image manipulation.

Following the above areas I’m sure we could increase the testing accuracy of the model to above 90%.

A simple web application in Flask could be built to leverage the model to predict breeds through user-input images.