DOG BREED CLASSIFICATION USING TRANSFER LEARNING :BEGINNERS GUIDE

This post Follows this repo

DATASET

Stanford Dog Dataset has around ~20 k images belonging to 120 classes and each image has an annotation associated with it.First Thought, No. of images per breed available for training data which is rough ~180 images, which is very less by the account of the Data required to train a Convolution Neural Net(CNN) classifier.

APPROACH

First I trained a CNN network From scratch but the accuracy was not acceptable given the liitle data per class.The notebook Illustrating the experiment can be foud here

Since amount of Data we have is a constraint we will use the concept of Transfer Learning ,which being said refers to technique that allows you to use the pretrained models on your own Dataset. Ihave usedusedVGG16,VGG16BN(VGG16 with Batch Normalisation) models . VGG16 is a Deep CNN trained over Imagenet Dataset which has around 1000 synsets .

As, very well described in the paper Visualizing and Understanding Convolutional Networks the Bottom Layers of a Convolutional Neural Net activates only with primitive features (colour,texture,shape…) so these features can be transferred to other applications as well .Here we replace the top Layers(Fully Connected Layers and the softmax layers) and freeze the rest Layers so that they are non trainiable.Also we will make use of Synthetic Imge Generation to take into account the randomness of the images.

Setting up the dataset

Download the Dataset and extract ,crop the images with the help of annotaions provided . Now we split the Dataset into Training,Validation and Testing.This should be done carefully ensuring there is no class imbalance in various chunks.Dataset can be converted into tfRecords format as this allows faster input and output operations. I won’t be explainig that here,See this page for further references .You can use this Notebook for setting up the dataset which includes everything from downloading to making train,validation and test chunks.

TRAINING

First we use a CNN Network and train it over the training data it with the default parameters and Adam optimizer.

After 25 epochs:

Training Accuracy:94.07%
Test Accuracy:51.07%

Next we use Keras pretrained VGG16 model and replace the top with Fully connected layers and a softmax Layer of 120 units since we have 120 classes.Make Sure you preprocess the input image the very same way it is done in the VGG16 paper. Now since the Bottom layers are frozen ,To avoid unnecessary computaion we can pass the input images once and save the Bootleneck features (The output of the Last convulational layer) .These Bottleneck features are then fed into the top model and the network is trained.The Best Hyperparameters after tuning were :

Learning_rate:1e-4
Decay:.06

After 50 epochs:

Training Accuracy :97.8%
Test Accuracy:40.23%

This clearly shows a presence of large variance or overfitting which can be mitigated by the use of Batch Normalization, L2 penalty or Dropout.Further we make use of Dropout And Batch Normalization with the help of the model VGG16BN .So, these regularizations methods gave us an 4% increase in the Test Accuracy but our model still seems to overfit the training data by huge amounts.Now it’s time to deploy our Image Data Genertor . You can about this in deapth from here.

Learning Rate:0.0001

After 20 epochs :

Training Accyracy:88.23%
Test Accuracy:76.53%

PREDICTION USING YOLO

We trained our Data by cropping out the relevant part of the image using annotation files in Stanford dataset.Now while making prediction on a Random Image we can make use Of Object Detection Algorithms like YOLO to locate the Bounding Box of a Dog in picture and then feed the cropped image to your model.

Caution: The effect of Object Detection Algorithm on prediction accuarcy of the model depends on the Accuracy of YOLO
To Test the Accuracy Of Yolo ,we can make use of Annotaions in the Dataset images and Bounding Boxes obtained by the YOLO algorithm .
Accuracy Metric:INTERSECTION OVER UNION OF THE TWO BOXES

Notebook Depicting the implemntation of YOLO with NON-max Supression can be found in the GitHub Repo.

ERROR ANAlYSIS

Aah!! Let’s see where our model is getting wrong . We can do this by generating a Plotting a Confusion matrix like this:

CONFUSION PLOT

This doesn’t Look very clear so it’s very difficult to Visualise anything .Let’s see the top 30 misclassified pair of breeds.

Top 30 misclassified pair of Breeds.

As it can be seen, the pair ‘Silky Terrier / Yorkshire Terrier’ is the absolute leader in terms of misclassification which does make sense if we look into how these two breeds of dogs look like:

SILKY TRIER(image taken from Google )
Yorkshire Terrier(Image taken from Google)

This looks like the Optimal Bayes error rate which is Human in this case is itself very low because even humans confuse in this.For more details see this article

CONCLUSION

We see how can achieve a train a Decent model with even modicum data with the Help of Transfer learning.

Future Work:
Try out Different Pre-Tained models like Inception V3 Or Resnet.