Dog Breed Classifier

Monica Li
11 min readJan 12, 2022

--

Introduction

Problem Introduction

Have you ever had a time that when we walk on street or look on instagram, seen a cute dog and don’t know what breed it is. This is exhuasting for many of us, we are all dog lovers, so dog breed classifier comes handy for us. Why dog breed classifier is challenging? These 2 sets of dog comaprsion may give us a taste of it. Human eyes barely can tell the differences.

Strategy to solve the problem

Basic strategy

Images comes in, frist we detect is there a dog or a human, is none of it, then turns out ‘’Error’; if a dog is detected, then use CNN model we built to predict breed; if a human face is detected, then use the CNN model will provide an estimate of the dog breed that is most resembling

Structue of the Post

This Medium post is nothing but a simple report of developing and testing the dog breed classifier, we will go through dataset, build dog and human detector, then create a Convolution Nerual Network from scratch and using transfer learning methods. At the end, there are discussions and potential improvments.

EDA (Exploratory Data Analysis)

There are 133 dog breeds in total, 8351 dog images splits into train(6680 images), valid(835 images), and test(836 images). As we can see, this dataset is quite small. Moreover, there are 13233 human face images, we won’t use human face for CNN, so no data splitting here.

First we use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, so we just gonna take advantange of that and go ahead to use it. The following is a list of human face images using face detector. So far so good, all the faces are captured. But if we pay attention to details, we know there is flaw, if one image has 2 faces, only one can be captured (Ex: 1st row, 5th column). There are more flaws for this face detector, if we use the face detector for the dog images, about 11% error rate (if we detect frist 100 images for both human and dog files)

Human Face Detector

Second is dog detector, use a pre-trained ResNet-50 model to detect dogs in dog images. Same as before, we want tot take a look of those cute dogs. Performace are excellent, no mistake has been made in human images, 100% dog can be detected in dog images.

Dog & label

Let’s take a deeper look of our dog image data, we know we have 133 breeds, but how it distributed ? is there a breed only has 1 image so our CNN wont able to learn from that breed? So we want to check the barplot for breeds distribution, we can see its almost equally distributed, there is nothing much we need to worry about, so the neural net have chance to learn enough from each breed.

Dog_Breed_Barplot

Convolutional Nerual Network

Earlier we quickly pass through the ResNet-50 model and dog detector. But how an image can go through ResNet-50 model? When using TensorFlow as backend, Keras CNNs require a 4D array (4D tensor) as input, with shape (nb_samples,rows,columns,channels). So we wrote a function (path_to_tensor) takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In our case, the returned 4D tensor should be (1, 224,224, 3), 1 means single image, 3 represents coloured image.

Metrics

A metric is a function that is used to judge the performance of your model, in our case judgements based on validation data. Metric functions are similar to loss functions, except that the results from evaluating a metric are not used when training the model. Loss fuction we use ‘CategoricalCrossentropy' class, we have images with correct labels attached, categorical cross entropy is good for computing the crossentropy metric between the labels and predictions, also it to be used when there are multiple label classes, as we have 133 classes, therefore ‘CategoricalCrossentropy' is a good one for our case. Model peroformace metric we will use ‘accuracy’ since its a measure calculates how often predictions matches labels. All of this choice made given the nature of our dataset, we have train, valid, test images with correct labels, and we are doing multi-class classification rather than regression.

CNN from scratch

Model Architecture

Its a CNN with 3 convenultional layer, 3 max pooling layers to draw features from images, then global average pooling layer to reduce number of parameters, in addtional there are 1 dense layer act as classifier (133 outputs). All CNN layers are using ‘relu’ activation function except last one, last dense layer I use softmax function, also there are 20% dropout to avoid overfitting. So convolutional layers are features extractor, and dense layer is the classifier.

  • Metrics: Metric defines as a performace judgement, it sets to be accuracy, we want to how often the prediction equals to label.
  • Modelling: Then model read 20 images per batch, epoch is 10, epoch means training the entire CNN with all the training data in one cycle, so we will repeat for 10 times.
  • Hyperparameter tuning: First optimizer, I use RMSprop, its special because the learning rate is adaptive rather a hyperparameter (global fixed learning rate for entire neural net). So it help dealling with problem like vanishing gradients. More detials can be found here. Loss fucntion sets to categorical crossentropy for 133 breed classes
  • Results: Not as good as we would expected, only almost 4% accuracy on test images. Details I will discuss in Discuss section. However, it is better than human guess, which is only 1%.

Transfer Learning

What is transfer learning, its like using a pre-trained matured model then alter it to solve our problem (detail). In this repoert I will illustrate two transfer learning models for dog breed classifier. One is VGG16 and another one is ResNet50. For our case, VGG16 and Resnet50 are both trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories.

Vgg16 Transfer Learning CNN

First of all lets take a look of VGG16 architecture.

VGG16 illustration

As we can see it has 16 layers, including 3 dense layers. So what we will do is freeze all the weights layers before dense layer, in another word we want to take advantage of VGG16’s features extrator, and remove the dense layers and make customized dense layers that suits our dog breed classifier goal.

Dog Breed Tranfer Learning Vgg16 Dense Layer Illustration
  • Metrics: Metric defines as a performace judgement, it sets to be accuracy, we want to how often the prediction equals to label.
  • Modelling: Model structure refers to ‘Dog Breed Tranfer Learning Vgg16 Dense Layer Illustration’ table. Then model read 20 images per batch, epoch sets to be 20 times.
  • Hyperparameter tuning: First optimizer, I use RMSprop; Loss fucntion sets to categorical crossentropy for 133 breed classes
  • Result:we get test accuracy 44%. Its a big improvent compared to earlier CNN we trained from scratch.

ResNet-50 Transfer Learning CNN

This time I will skip the picture for ResNet50, it has 50 layers so its too complicated and large , here is the link for full architecture details. Same as VGG16, we freeze all the layers before dense layers, cut the oringial dense layers, and alter it to suits our goal, 133 dog breed classes instead of thousands of classes.

Dog Breed Tranfer Learning ResNet50 Dense Layer Illustration
  • Metrics: Metric defines as a performace judgement, it sets to be ‘accuracy’, we want to how often the prediction equals to label.
  • Modelling: Model structure refers to ‘Dog Breed Tranfer Learning ResNet50 Dense Layer Illustration’ table. Then model read 20 images per batch, epoch sets to be 20 times.
  • Hyperparameter tuning: For the optimizer I choose SGD (stochastic gradient descent), to me its actually better than RMSprop, as I mentioned before, RMSprop is good for handling vanishing gradient, so it is a better optimizer for Recurrent Neural Net; Loss fucntion sets to categorical crossentropy for 133 breed classes
  • Result: after train the model for 20 epoches, test it on test data, then we get accuracy rate us about 85%. Its impressive performace for the little efforts we made so far.

Reflection

Comparison Table

Summary

From comparison table, we can easily see deeper neural nets can get better results, this is true in general for nueral nets, we always want to have deep model if possible. Here comes the questions, why the CNN we built from scratch performs so poorly? First reason is the neural net is too shallow, not deep enough. Also our dataset is very small, only less than 7k images we can train on. Therefore, VGG16 and ResNet50 get trainned over millions of images and thousands of categories. In additon, we only train it for 10 epoches, left all hyperparameters as default. We could do more for minor improvements for the CNN built from scratch. Vgg16 CNN got a huge accuracy jump, and ResNet50 pushes the accuracy rate to 85%, that is 20X compared to the scratch CNN, 85X compared to human brian random guess. In a conclusion, we use ResNet50 transfer learning CNN to build our dog breed classifier.

Real World Images Testing

After spining around with few CNN we built, then we use ResNet-50 build a CNN via transfer learning, it can reach 85% test accuracy, which is perfect for us. This section I am discussing few particular aspects of the project I found interesting or difficult through real world images testing.

I collected some random images from internet, including human, dogs cartoon style human and dog, also cat, no one wants to leave cat behind. But before that, lets take a look the dog breed classifer on out test images which with correct labels.

Dog Breed Classifier test on Test Images

The middle one is wrong, well, it’s blurry to human eyes too (but it doesnot mean this error is acceptable, at end of the day, we want all the Neural Net works perfectly if possible. Further improvements will be discussed later).

Move to real life random images, in general it can get dog breed right if it is a clean no distraction picture( but that is not real right, we can’t always get a clean cut images), it also can tell a cat is not a dog, human face will get a closetest dog breed resembling. So to trick the Dog Breed Classsifier, I included few images that are interesting to be diccussed and reasoning why its difficult for the classifier to handle such images.

  1. Both are dogs, maybe a bit hard to tell the breed. But the classifier gave us ‘Error’ back, which means it didn’t detect any dog or human. How? Most likely, both dog face are unusual, so our model cant detect anything it did not get trained to do, in our train images, none of the dog images have style like these. But it could do a dog face detected and give back the most resembling breed it thinks. If we really want to make this happen, we could transfer style of our existing images through style transfer CNN. Then retrain our model to see how it will perfrom on these intersting dog faces.
  2. Next picture is a girl with her dog (I think it is a golden retriever), and the model predicts ‘Afghan_hound’, well, this is interesting, so I attached a picture of real Afghan Hound at right. Now we can sense what mistake the classifier has made, first a dog has been detected, which is right. But the focus is off, the ResNet-50 CNN focuses on the girl (human) rather than dog given a dog has been detected. Whats even more interesting is, given the girl has long flowy hair, so the classifier predicts Afghan hound (this would be nice if a human face detected and give us the most resembling dog breed). This funny error gives us some room to improve the classifier, we know if there is two objects (or mutiple objects) in one picture, then the model may get confused what part to look, so if we can introduce something like image segementation and labelling to our classifier, it may get improved a lot for multi object pictures. I have to say, this is a really challenge task to do. So we will stop here for now.

3. Next one is quite interesting too, a abnormal human face is given and the classifier actually returned the closetest resembling dog breed rather than error.

We can see this is a abnormal human face, its a bit closer to cartoon human character we see on Netflix. However, after a face was detected and it gives back the most resembling dog breed. Which is quite true, this human face does resemble to chihuahua, both has big eyeys and huge ears. More than 3 objects present in this picture, but still the classifier find a way to focus on the right one.

These specific pictures discussions may not be true in general, it is just something I would like to share about the actual performace of the model in real life pictures from internet.

Improvements

We discussed a little bit of the potential improvements for the classifier, so far we know it can not detect dog face if its not a normal one, also it has difficulty deal with multiple objects sometimes if they are too closer or edges are blurry.

Overall the transfer learning model (ResNet50 CNN) is better than I expected, but I have 3 points of possible improvements for the model itself.

  1. Hyperparameter tuning: leanring rate, optimizer function and loss function, number of epoch. Here is the link to transfer learning and fine tunning on keras.
  2. Pre-trained model: we use vgg16 and ResNet50 here, but there are many out there we could try. Or we could do more modification on classifier layers, add one addtional fully connected linear layer could be an option to start with.
  3. Image Data Augmentation:add more input images through augmentation is another way to improve too. Agumentation could be rotate, crop, dimming the images…. etc

Conclusion

We made a CNN that can classify dog breed with 85% accuracy rate, which is 85X more accurate than human guess. We learn that deeper neural nets always can do a better job in general, hyperparameter tunning is important for model compling, it can help us convergence quicker if we get the right suite. In addtion, transfer learning is helpful for us we should always check the lastest research and github repos to stay active for this field

At the end, thank you for staying this far.

--

--