Classifying Dog Breeds Using Convolutional Neural Networks
Who doesn’t love dogs. For generations they’ve been known as man’s best friend and it’s easy to see why.
However being a lover of dogs doesn’t mean I know anything about them. I can count the number of dog breeds I know on two hands, even then I’m not confident I could definitely recognise a beagle if I saw one.
Turns out I’m not the only one facing this issue, even dog owners and the most avid dog lovers can struggle with naming the breed of some dogs. This isn’t such a surprise when you consider that there are over 360 different breeds according to the Fédération Cynologique Internationale (FCI), also known as the World Canine Organisation.
So how do we solve this problem without having a canine expert beside us every time we walk through the park? Simple — we build an algorithm to recognise over 100 different breeds of dog.
In this article we will explore:
- Why we use neural networks, in particular convolutional neural networks (CNNs) to solve this problem
- How we can build, train and test a simple CNN
- Using existing open-source CNNs to develop a stronger dog breed classifier
- Examples of the algorithm in use on various dog breeds (and humans too!)
Teaching a computer to tell the difference between a Poodle and a Dachshund
Our goal is to try and create an algorithm to pick out the correct breed of a dog from hundreds of different breeds. To start building this we think of how a human would solve this problem — by identifying the different features a dog exhibits (size, colour, fluffiness etc.)
Neural networks seem like a good solution here because, like the human brain, they take in several inputs, perform some hidden calculations and return an output
Once we’ve established that neural networks are the way forward, we want to make sure that our neural network can output multiple classifications as we have several dog breeds to distinguish between and we must make sure our neural network can take images as the input — just like a human uses their visual senses to determine the different features of a dog
Considering these requirements our perfect solution seems to lie in a Convolution Neural Network. The reason it works so well here becomes apparent as we explore the different components of the CNN
- First we have the input layer. A convolutional neural network is able to take in a 1D, 2D or even 3D array as an input which makes it perfect for our dog images. We can split our images into a grid of say 28x28 pixels in either grayscale format as a 2-dimensional input in RGB to input 3 stacked red, green and blue grids to form a 3-dimensional array.
- After we have defined our inputs the CNN will be made up of two types of hidden layers to perform our calculations. The first of these is convolutional layers which will break the image up into smaller chunks and apply several filters to see if certain features are present within that chunk of the image. In early parts of the neural network these may be filters to determine straight edges or corners while further down the network the filters may look for circles (resembling the eyes of a dog) or criss-cross patterns.
- The second type of layer is pooling/subsampling layer. This will look for localised maxima of features within the chunks or aggregate the average presence of a feature within the chunk in order to reduce the size of our overall array as we go into the next layer. This enables us to look for more complex features deeper into the network which will prove important when we want to establish the difference between a Brittany and a Welsh Springer Spaniel.
4. The final layer in our CNN will be a fully connected layer. This is our output layer that will provide our multi-class classification. Specifically it will give us a vector with an entry for each possible dog breed in our dataset, along with a probability corresponding to the likelihood that this breed is within our supplied image.
How to build a simple functioning Convolutional Neural Network
The words simple and neural network don’t sound like they belong in the same sentence. Fortunately thanks to the keras library in python we are able to create a relatively accurate model in just a few simple steps.
- Data pre-processing:
As the features within our data are all encompassed within an image there are very few pre-processing steps we need to take. We need to ensure that our image file paths can be accessed easily by storing them as an array and then we split this into three datasets — train, validation and test. The reason we include a validation set here is to ensure we avoid overfitting our model to the features specific to our training dataset.
We also want to one-hot encode our target data, in this case the dog breed categories, which will allow us to easily compare our output vector with the actual targets as these will have the same dimensions.
- Build and compile the model:
When it comes to building a neural network it can be useful to set a baseline target for it’s accuracy and performance before we start putting it together so that we can include enough layers to get our desired accuracy whilst ensuring we can train our model in a sufficient amount of time.
In this scenario we set a baseline of achieving over 1% accuracy when training the model over 5 epochs (that is we minimise our error in each filter up to 5 times) in less than 2 minutes. Our dataset contains 133 different dog breeds so this threshold of 1% tells us that we are performing better than random chance.
Once we have our targets set we can use keras to build a model architecture with 3 convolutional layers and 3 pooling layers that fit in between our inputs and a fully connected dense layer providing the predictions. The architecture appears as follows:
When compiling the model we need to specify our optimizer, loss function and assessment metrics.
A good starting point is to use the RMSprop optimizer. This takes account of the moving average of the gradients around the focus area so that it can take steps according to whether the gradients are high or low.
A suitable loss function is categorical crossentropy as we have a multi-class output.
Finally we use the keras accuracy metric as part of our model because it will directly compare our output vector with our label vector (it will take a dot product of the two vectors returning a single percentage).
- Train and Test:
Once we have our model built and compiled we need to fit our model to our training dataset. As part of this training step we need to specify the number of epochs we would like to train our model for.
If we have multiple epochs then we want to test our model against our validation dataset each time. This ensures the weights we have obtained are not overfit by keeping track of our best performing weights as we pass through each epoch. The code for this step looks like:
epochs = 5checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', verbose=1, save_best_only=True)model.fit(train_tensors, train_targets,
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)
Once we have trained the model and saved the best performing weights we finally test this against our test dataset to obtain our model accuracy (see the github link at the bottom to see if we achieved 1%)
We have successfully built a model that can beat random chance when it comes to classifying the breed of a dog. Unfortunately this result of 1 in 100 predictions being correct isn’t going to impress anyone.
Fortunately, rather than using the simple model we’ve built in under 2 minutes as the basis for our solution, we are able to use transfer learning to build a dog breed classifier from another pre-trained CNN.
There are several models available to us that have been trained with enormous datasets over a time period of weeks as opposed to minutes. A few examples include the VGG-19 and Xception models, both of which are available to access via keras.
When choosing a model there are two important factors to consider both in model selection and model implementation:
- How similar our training data is to the data used to train the model
- How large our training dataset is
The similarity of training data is important as it means we can use more of the layers in the existing model and their pre-determined weights.
As our training dataset of ~8000 dog images is relatively small we want to use the existing feature weights that came with the pre-trained model rather than retraining it ourselves from scratch.
Once we have determined which model we simply strip off the last fully connected and build our own which we can then train and validate as before. We can use more epochs this time without hindering our performance as we have fewer layers we need to train.
Putting the working dog breed classifier to use
After considering the different options available, it appeared that the ResNet-50 model was the most suitable for our task as it had been trained with several different classes of objects, one of which was animals.
Along with our dog breed classifier we were able to use ResNet-50 to also identify whether the image supplied actually contained a dog or not before classifying it. As a bit of fun there is a feature that allows the user to input an image of a human to see what their closest dog breed may be.
We find that the results for dogs is pretty successful but the results for humans can often be a blow to the ego when you get compared to a bullmastiff
There are however some improvements to be made to the algorithm. It will successfully throw an error in cases where there are no faces at all in the image however when my cat Siri was mistakenly interpreted as a human. Furthermore images of puppies don’t always get classified correctly as their features aren’t fully developed yet.
A possible solution for these issues is to simply increase the size of our training dataset and ensuring we include more puppies and more cats.
Conclusions
In summary, creating a dog breed classifier is a fun and relatively easy way to start exploring neural networks and exhibit the power of CNNs. There are a variety of use cases that we can now explore with the help of deep learning and convolutional neural networks.
By experimenting with a basic neural network from scratch it allows us to determine the components and parameters that will improve our results, however this will always have the challenge of finding a balance between performance and computational capabilities.
Our best results can be obtained by making use of transfer learning and using the hidden layers of pre-built CNNs to optimise our own algorithms.
Further improvements to our overall model can be made by experimenting with more data, different optimizers and different pre-trained models. It is probably best to save this extra effort for projects that will have more real life implications such as in healthcare or climate change, but projects like this help us to build the foundations we need to tackle new problems.
Appendix
To run the notebook detailed in this article please check out my github
For more cute pictures of our wonderful model Coco follow on instagram @cavaliercoco_