Creating Dog versus Cat Classifier using Transfer Learning

Published in

Tesseract Coding

11 min readMay 24, 2020

Dog versus Cat has been a standard Problem in Machine Learning and Computer Vision that aims to build a model that can predict if a picture contains either a Dog or a Cat. With the aid of the development of Neural Networks and Frameworks like Tensorflow and PyTorch in last few years, this problem was accelerated efficiently. In this article, we will build a Dog and Cat Classifier using Transfer Learning to develop a model that can predict if a picture if of a cat or a dog with appreciable accuracy. To develop this model, we will be using Oxford-IIIT Pet Dataset and Keras and will learn a lot of new concepts like Data Modelling, Pre-Processing and how we can augment the accuracy of our baseline model using Pre-Trained Models via Transfer Learning.

What is Transfer Learning?

The idea and methodology behind Transfer Learning are quite simple: To use an existing pre-trained Neural Network’s knowledge on a new dataset which it has never seen before. Sounds easy?

The idea behind Transfer Learning started with AlexNet back in 2012 where it won the ImageNet Large Scale Visual recognition challenge. After AlexNet, many other pre-trained models came into the scene which outperformed AlexNet in terms of accuracy on ImageNet Dataset. Researchers conceptualized an idea to utilize these pre-trained models to train and develop new classifiers on a dataset that the pre-trained model has never encountered before. This technique was used to harness and transfer the learning of a previous model to a new dataset.

And surprisingly, Transfer Learning makes it quite easy for Researchers and Machine Learning Developers to train models on new datasets by just changing and modifying the last layer according to their needs. ImageNet Dataset has 1000 Output Classes. Fashion MNIST, on the other hand, has only 10 while our Dog-Cat Classifier has only 2. We can achieve this by changing the last layer of the model and freezing the pre-trained model by setting the variables to non-trainable.

This is done because, during the whole process of Data Modelling, we don’t want to change or update the useful weights of the pre-trained models in any way. If we randomly initialize the weights, the model is expected to underperform which we don’t desire in any way. So, this was a brief introduction to Transfer Learning.

To get started with the Code, it is preferred for you to kickstart it on Google Colab or Azure Notebooks as this will speed up your Development and Data Modelling time and you don’t need to worry about installing the Packages. Now we will start by downloading our Dataset and preparing it for our Data Modelling.

Downloading the Dataset

As mentioned before, we will be using the Oxford-IIIT Pet Dataset for our purpose and intent here to develop a Dog-Cat Classifier. We will be using urllib Library here to download the Dataset, which is a Python Library for opening and reading URLs. We will also import the necessary packages first so that we can streamline our development process. Let’s first download our Dataset from an Online Storage:

In the above block of code, we have done two things:

Declared the Necessary Packages we need to work with the Data.
Downloading the Dataset which includes the Images and Annotations.

To download the Images and Annotations, we have used the urllib library and used annotations to point out the pictures in our Dataset which we will be using for our Data Modelling. We will be using the Matplotlib to show us some examples from our Dataset and to modularize our code, we have written everything in separate functions which we can call later for our purpose. This is our starting Boilerplate Code here and it doesn’t matter much if you don’t understand it. We will be diving deep into the more complex parts now.

Data Pre-Processing

After you have run the first code cell with the Code mentioned above, we can now pleasantly move onto the next steps. We will now be doing some Pre-Processing with our Dataset, and prepare the Data for our Modelling and Final Analysis. We can start this formally but first, let’s ensure that we have got Tensorflow installed on our environment and a GPU Available for our Data Modelling:

If you are using Colab, you can enable GPU by going to Runtime and clicking on, Change Runtime. You can enable the GPU Option by clicking on Hardware Accelerator and clicking GPU.

Now that we have our Dataset downloaded and we have done the necessary checks, we can move ahead with the problem. Since this is a Binary Classification Problem we will be using annotations to point out to our samples in the Dataset. Our Dataset has been downloaded onto the Data directory so now we will be writing some code to properly annotate Dog and Cat Pictures. If you take a look at your Dataset, you will find that there are annotations and images. In the annotations, you can see a list which is your list of files and trainval and test which holds our Training and Testing Dataset. These contain Image Names which we will be using to annotate our Dataset.

We will now write some code to map the Index to Class and vice versa and we will write some extra code to extract our annotations and print the count of Training Examples and Testing Examples:

Basically, the Class to Index and Index to Class, means that the pictures annotated as 0 belongs to a Cat while the ones annotated as 1 belongs to a Dog. Now that we have extracted the annotations, let’s move next to get the images as well.

Now we will write some code to get a Random Batch from our Images. We will be preprocessing our Dataset here using Keras. The primary purpose of Random Batch is to take the annotation dictionary and return some randomly selected examples from our Dataset.

What did we do here?

We took the arguments in our function as the Annotation Dictionary and the Batch Size. The Annotation Keys are converted into a list and the length is stored.
We used Numpy’s Random Function to return some randomly chosen examples from our Dataset.
The images are stored onto an Array x where we pass the batch size as the parameter and take 128x128x3 as the Image Size. The reason why we are using this is that we will be using MobileNet as our baseline model and this is one of the Image Options that are available for us and is the smallest size the Image is available for.
We will store the Class Output onto the Array y where we store either 1 or 0 which will be our Class Output.
Now we will run a loop to return the Image Examples and not just the Labels and Annotations. We will extract the Image from the Images directory and passing the key index of Image.
We can use the Preprocessing Helper Function defined in Keras by calling in tf.keras.preprocessing.image and then passing the Image Path and the Channel size which is 128x128x3.
We will use another Keras Function to convert the Image to a Numpy Array and for this operation, we will use tf.keras.preprocessing.image.img_to_array just because our model is expecting data in a multi-dimensional format.
We can further preprocess our input dataset by using another Keras function tf.keras.applications.mobilenet_v2.preprocess_input. This is primarily done to change the Mean Standard Deviation of our Dataset so that it is easy to achieve the Global Minima.
At last, we can write some code to expand our dimensions to return the batch size according to the size that we have passed in the function. Finally, we will pass the array to x and the keys to y. We can then return both x and y and end the function block.

Now that we have written the function to return a random batch, let’s try it out. We will write a small block of code where we will call our previously defined function and display the examples using the functions that we defined earlier:

This will show our output as:

It will show random batches of our Dataset so we can expect to see no similar pictures here. This will come handy next when we have completed the Data Modelling and need to see some results. So we have learnt how we can pre-process our Dataset using Keras and how we can write some code to generate Random Batches. Now we will move ahead with Data Modelling and preparing our Model from a MobileNet v2 Pre-Trained Model.

Data Modelling

For the purpose of this article, we will be using MobileNet v2 as our Baseline Model to perform Transfer Learning. MobileNet is a popular Pre-Trained Model that has been trained on the ImageNet Dataset. Training a model akin to MobileNet, which has millions of parameters is computationally expensive and takes a lot of time and to save the time and increase the accuracy manifold, we will be relying on Transfer Learning for our purpose.

Let’s download the MobileNet v2 from Keras and we have to specify a few parameters as well. Let’s write this simple block of code:

We are setting include_top to False so that we don’t modify any of the existing layers and effectively freeze them. We will specify the input shape since this is a Binary Classification problem and MobileNet can classify well up to 1000 Classes. We will also convert our pooling to average so that a Four-Dimensional tensor can be converted to a Two-Dimensional one. We will also specify the pretrained weights, which is the main part here.

We will get the whole MobileNet architecture summarized here and take no issues if you don’t understand it much. We will be leveraging this architecture to develop our own model. We have got here more than two million trainable parameters and now we will be creating our own model on top of this. So let’s get started with some code:

Here we have initialized our model, so let’s discuss in points on what we did here:

We will be using a Sequential Model from Keras. This will define how the model will be laid down in Stack Layers.
We will set the MobileNet architecture that we downloaded as our first layer and then we will set the Dropout Layer with 0.5 as the Dropout Rate.
Our Final Layer will be a Dense Layer and we will use Sigmoid as our Activation Function here.
We will then set the Trainable Layers to False so that we don’t train them from our side and then we will pass a model.compile() function with a Binary Crossentropy as our Loss Function and an Adam Optimizer.
We will then move forward to creating the Model and printing the Model Summary.

Once we have done these steps, we can see the Model Summary on the Screen. We can see that we have some 1,281 Trainable Parameters and the rest, from MobileNet, is converted to Non-Trainable. We will now write some code before we kickstart our Model and start the Training Process.

We will first write a Data Generator Function which can take in the Batch Size and Annotation as function parameters and pass the Data. We can use this in our Fit Generator Function. We will write a loop which can fetch random bathes from our Dataset:

We will now define the batch size along with other necessary parameters like steps per epoch and validation steps. So, let’s write some code for that:

Now we will fit our Model by using the fit_generator() function and we will be using the data_generator() function where we will pass the batch size and Training Annotation. We will also define the Validation Data and pass the batch size and Testing Annotation. We will also define the Validation Steps and Steps per epoch and we will run one epoch. We are running the whole model for one epoch because we already have pre-trained weights and our model needs to go through all the examples just once.

We are using the %%time Magic Function to time the Code Cell. This might take a few seconds or minutes to configure and train and we can see that have achieved a Validation Accuracy of 94.35%.

Predictions

Now that we have completed our Training Part and the Validation Accuracy would be anything above 90%. The main advantage of using Transfer Learning and a Pre-Trained Modle is that we don’t need to train our model for long. We can just have Pre-Trained Weights and we can configure the Input Layers as per our requirements and the Dataset that we have in hand. So, let’s visualize the results and see what we have got. We will write a small block of code to get some random examples on which we can predict whether the picture belongs to a Cat or a Dog using our trained model.

We will get this result:

Conclusion

In this article, we have discussed a lot of things and finally built our own Binary Classifier using Transfer Learning to classify pictures of Dogs and Cats. We have also learnt on how you can utilize these concepts to work with Pre-Trained Models and how you can modify the Input Size as per your requirements. To summarize, we have explored these things:

What is Transfer Learning?
How can Transfer Learning be used on new Datasets?
Pre-Processing the Dataset.
Data Modelling with Transfer Learning and Pre-Trained Weights.
Visualizing the Results.

What’s next? You can use the necessary takeaways from this Article and utilize Transfer Learning on some new Datasets. You can also start exploring some new Pre-Trained Models like VGG16 which I used to create a COVID-19 Classifier here, Mask R-CNN and YOLOv2. Best of luck with your future endeavours!

You can view the code here!