How to use Deep Learning to identify dog breeds

Using Convolutional Neural Networks and Transfer Learning to build a dog breed classifier

7 min readApr 20, 2022

In an age where people constantly share images, it can be very useful to know how to analyze them and create value from them.

Image classification, for example, can be very helpful. There are hundreds of breeds of dogs in the world, and for non-experts, it can be very difficult to identify them correctly.

In this project, we will see how, using Convolutional Neural Networks (CNNs) and transfer learning, we are able to create a model that can identify the breed of a dog and even tell you what breed of dog you look like!

Metrics

To measure and compare different models’ performance, we will use accuracy, i.e. the number of good predictions divided by the total number of predictions.

Moreover, we can use this metric because even if the classes are slightly imbalanced -as we will see below- since we have a large number of possible breeds (133 in our dataset), if we simply predicted the most popular breed for every dog or applied random guessing, we would have very low accuracy. Thus, accuracy will give us a good overall sense of how the model is doing.

For optimizing the model, the loss function used is the categorical cross-entropy loss function, which is often used in multi-class classification tasks. Moreover, when training our CNNs, a validation set will be used to see how well the model generalizes, and at each epoch, the best model will be saved.

Overview

The project contains several steps:
· EDA
· Modeling
∘ a. Detecting Human Faces
∘ b. Detecting Dogs
∘ c. Building a CNN from Scratch
∘ d. Building a CNN using Transfer Learning
· Results
∘ Predicting dog breeds
· Conclusion
· Improvements

To access the full code, check the GitHub repository.

EDA

First of all, we will use two datasets, one containing labeled dog images, and another one containing human faces.

After loading these images, we can see that there are 13,234 total human images, and 8,351 dog images divided into 133 categories.

For each dog breed, we have on average 50 dog photos. However, as we can see in the image below, many categories contain much fewer images. Thus, our dataset is rather small.

Each bar represents the number of dog images for a given category

Modeling

a. Detecting Human Faces

Since we would like to tell a human which dog they look like, we must first be able to detect a human face. To do so, we can use OpenCV (Open Source Computer Vision Library) and a pre-trained classifier for frontal faces.

Then, to implement the face detector, we import the cv2 package and we write the following code:

Testing this function on 100 human images and 100 dog images, we obtain the following results regarding the classifier:

It correctly identified a face in 98% of human images
It mistakenly identified 12% of dogs as humans.

b. Detecting Dogs

Now for detecting dogs, as before, we can use a pre-trained model. In this case, we will use a pre-trained ResNet-50 model, trained on ImageNet.

Before using our model, however, we must preprocess our images. Indeed, when using TensorFlow as the backend, Keras CNNs require a 4D tensor as input, with shape: (nb_samples, rows, columns, channels).

We can accomplish this by applying the function below:

Now that we have a way to format our image for supplying to ResNet-50, we are ready to use our pre-trained model.

Note, however, that we must implement an additional preprocessing function from Keras: preprocess_input. This will helps us normalize our images, by subtracting the mean pixel from every pixel in the image.

As we can see above, we take the argmax of the predicted probability vector. This gives us an integer corresponding to the object class predicted by the model. Using this dictionary, we can identify the object class, knowing that the dog classes correspond to keys 151 to 268.

Finally, as before, the model was tested on a subset of 100 images of dogs, all of which were correctly identified, and on a subset of 100 images of humans, one of which was mistakenly identified as a dog.

c. Building a CNN from Scratch

Before using transfer learning, it can be good to take a closer look at how CNNs work.

Convolutional Neural Networks use locally connected layers, usually mixing:

Convolutional Layers that detect regional patterns in an image
Pooling Layers that help decrease the dimensionality and thus decrease the risk of overfitting

Moreover, the first layer must always include the input shape (in our case: 224, 224, 3) and the last layer must be a fully connected layer with a node for each category and a softmax activation function (needed for computing the categorical cross-entropy loss function). All other layers should have a ‘relu’ activation function.

Knowing this, we can build a fairly simple CNN where we add an additional fully connected layer, a dropout layer to prevent overfitting, and a layer to flatten the input :

This model, trained for 10 epochs, with a batch size equal to 25, obtained 6.34% accuracy on the test set. Although this is a very weak performance, we can note that it is at least above random guessing.

Using transfer learning, we will be able to have much better performance.

d. Building a CNN using Transfer Learning

Now we will see how to use transfer learning to create a CNN that can identify dog breeds from images.

When using transfer learning, two important things to check before choosing a model architecture:

Is our data similar to the data used for training the pre-trained model?
How big is our dataset?

In this case, the data we used here is similar to the data used for training the pre-trained CNNs available in Keras. In addition, we saw that our dataset is rather small. These two factors indicate that our model architecture should largely include the pre-trained model as is, but without the end of the neural network which should be replaced by a new fully-connected layer.

In addition, to reduce training time, a GAP (Global Average Pooling) layer could be included in between.

We can expect the training process to be rather fast as it will only include the GAP layer and the Dense layer.

Results

The model architecture discussed above was implemented for 20 epochs and using a batch size equal to 32. Three different pre-trained CNNs were tested:

And the following results were obtained on the test set:

We can see that using Xception we obtained 85% accuracy, much better than before!

Predicting dog breeds

Most of the work has already been done, now we only need to write a function that can take advantage of our previous work to estimate the breed of a dog.

The goal for our algorithm is to, given an image path file, be able to tell if the image provided contains a dog or a human and to estimate which breed the dog is, or what dog the human looks like. In addition, it should display the image provided as well as an image of the estimated breed.

To do so, first, we write a function that can find a dog image for a certain breed among our training files (find_breed), then another function to be able to display both images (show_images), and finally, we can write our final function, which builds upon our previous work and uses the model trained using Xception:

These are the results:

Not so bad!

Conclusion

This project takes a look at how to use modern technologies to deal with image classification. More precisely, we saw how to build a model using CNNs and transfer learning and we were able to build a multi-class model that achieved 85% accuracy on test data.

Improvements

There are many ways of improving this work, a few would be:

Testing other model architectures
Fine-tuning training hyperparameters (number of epochs, batch size)
Adding more dog breeds and more images per breed
Apply data augmentation