The Data Science Algorithmic Classification of Dog Breed.
Project Motivation:
The main motivation for this projects was not not just a part of nanodegree but rather the curiosity to learn and implement the convolutional neural networks, which made this one as the choice out of the three projects.
Project Overview:
In this project, I have developed an understanding to build a pipeline to process real-world, user-supplied images. Given an image of a dog, the algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed. Along with exploring state-of-the-art CNN models for classification, I have made important design decisions about the user experience for our app. By completing this lab, I understood the difficulties involved in clubbing together a series of models designed to perform various tasks in a data processing pipeline. Each model has its pros and cons, and engineering a real-world application often involves solving many problems with approximate but not perfect solution.
Problem statement :
Write an algorithm for dog identification app with implementation through convolutional neural network.
Metrics:
Test _Accuracy: It is the Accuracy calculated of the Testing data set of images and is found to be 82.655% approx. Usually the one must make sure the test accuracy of the dataset is above 60%.
test_accuracy = 100*np.sum(np.array(Xception_predictions)==np.argmax(test_targets, axis=1))/len(Xception_predictions)
print(‘Test accuracy: %.4f%%’ % test_accuracy)
Data Exploration :
After importing the dog data sets, and human data sets the following were the results
There are 133 total dog categories.
There are 8351 total dog images.
There are 6680 training dog images.
There are 835 validation dog images.
There are 836 test dog images.
There are 13233 total human images.
As we used the readymade image data sets not much preprocessing was needed but rather we needed to train our model for identification of the dog breed with a good accuracy.
Implementation Steps:
- Detect Humans:
We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on github. We have downloaded one of these detectors and stored it in the haarcascades
directory.
Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale
function executes the classifier stored in face_cascade
and takes the grayscale image as a parameter.
Faces
is a numpy array of detected faces, where each row corresponds to a detected face. Each detected face is a 1D array with four entries that specifies the bounding box of the detected face. The first two entries in the array (extracted in the above code as x
and y
) specify the horizontal and vertical positions of the top left corner of the bounding box. The last two entries in the array (extracted here as w
and h
) specify the width and height of the box.
we use these metrics to write a function of face detector.
- What percentage of the first 100 images in
human_files
have a detected human face? - What percentage of the first 100 images in
dog_files
have a detected human face?
Ideally, we would like 100% of human images with a detected face and 0% of dog images with a detected face. You will see that our algorithm falls short of this goal, but still gives acceptable performance. We extract the file paths for the first 100 images from each of the datasets and store them in the numpy arrays human_files_short
and dog_files_short
.
100.0% images of the first 100 human_files were detected as human face.
11.0% images of the first 100 dog_files were detected as human face.
2. Detect Dogs:
we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.
Pre-process the Data
When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape
(nb_samples,rows,columns,channels),(nb_samples,rows,columns,channels),
where nb_samples
corresponds to the total number of images (or samples), and rows
, columns
, and channels
correspond to the number of rows, columns, and channels for each image, respectively.
The path_to_tensor
function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape
(1,224,224,3).(1,224,224,3).
The paths_to_tensor
function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape
(nb_samples,224,224,3).(nb_samples,224,224,3).
Here, nb_samples
is the number of samples, or number of images, in the supplied array of image paths. It is best to think of nb_samples
as the number of 3D tensors (where each 3D tensor corresponds to a different image) in your dataset!
3. Making Predictions with ResNet-50
Getting the 4D tensor ready for ResNet-50, and for any other pre-trained model in Keras, requires some additional processing. First, the RGB image is converted to BGR by reordering the channels. All pre-trained models have the additional normalization step that the mean pixel (expressed in RGB as [103.939,116.779,123.68][103.939,116.779,123.68] and calculated from all pixels in all images in ImageNet) must be subtracted from every pixel in each image. This is implemented in the imported function preprocess_input
.
Now that we have a way to format our image for supplying to ResNet-50, we are now ready to use the model to extract the predictions. This is accomplished with the predict
method, which returns an array whose ii-th entry is the model's predicted probability that the image belongs to the ii-th ImageNet category. This is implemented in the ResNet50_predict_labels
function .
Write a Dog Detector
While looking at the dictionary, you will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151–268, inclusive, to include all categories from 'Chihuahua'
to 'Mexican hairless'
. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained ResNet-50 model, we need only check if the ResNet50_predict_labels
function above returns a value between 151 and 268 (inclusive).
We use these ideas to complete the dog_detector
function below, which returns True
if a dog is detected in an image (and False
if not).
After training the dog detector we found the answers to the following questions
- What percentage of the images in
human_files_short
have a detected dog? - What percentage of the images in
dog_files_short
have a detected dog?
Answers
0.0% images of the first 100 human_files were detected as dog.
100.0% images of the first 100 dog_files were detected as dog.
Creation of a Convolutional Neural Netwrok to predict the images as Dogs, as we have wrote functions to classify dogs and humans.
- 3 convolutional layers are created with 3 max pooling layers in between them to learn hierarchy of high level features. Max pooling layer is added to reduce the dimensionality.
- Flatten layer is added to reduce the matrix to row vector. This is because fully connected layer only accepts row vector.
- Filters were used 16, 32, 64 in each of the convolutional layers.
- Dropout was used along with flattening layer before using the fully connected layer to reduce overfitting and ensure that the network generalizes well.
- Number of noders in the last fully connected layer were setup as 133 along with softmax activation function to obtain probabilities of the prediction.
- Relu activation function was used for all other layers.
4.Use a CNN to Classify Dog Breeds
To reduce training time without sacrificing accuracy, we show you how to train a CNN using transfer learning. In the following step, you will use transfer learning to train your own CNN.Obtain the bottleneck features.
Model Architecture
The model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.
Then after compiling the model and training the model load the model with the best validation loss and test the model.And make a prediction of the dog breed with the test set.The prediction test accuracy found till this step was approx 43.8 percent .
5. Create a CNN to Classify Dog Breeds (using Transfer Learning):
we used transfer learning to create a CNN using VGG-16 bottleneck features. In this you must use the bottleneck features from a different pre-trained model. To make things easier for you, we have pre-computed the features for all of the networks that are currently available in Keras.
Now again obtain the bottleneck features for train test and validation sets,and then start with the creation of the CNN
The new data set is small and similar to the original training data, hence end of the network is sliced off and a fully connected layer that matches the number of classes in the new data set is added. Next the weights of the new fully connected layer are randomized; All the weights from the pre-trained network are frozen. Finally network is trained to update the weights of the new fully connected layer. Overall, I have used the Xception model which gave the best result (85.17% accuracy and 84.2 with ResNet-50). I also added a fully connected layer with 500 nodes and a ReLU activation function to detect more patterns and a Dropout to avoid overfitting.
Then compile the model and train the model, load the model with the best validation loss,test the model ,and finally predict the dog breed with our model.
6. Write your Algorithm
We wrote an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,
- if a dog is detected in the image, return the predicted breed.
- if a human is detected in the image, return the resembling dog breed.
- if neither is detected in the image, provide output that indicates an error.
7.Test Your Algorithm:
After performing tests output is quite good from what i anticipated. There is an anomaly observed though for the prediction in case of Input 75 (Test Image: 7) this should have predicted as a Dog. And the rest of the output are working perfectly ! Infact it was able to accurately identify the human with the dog filter from snapchat app in case of the test image 17.
- Accuracy can be improved if we increase the complexity(by increasing number of layers) of of neural network.
- Other Models should be explored if the time required for prediction can be further minimized ,while still preserving the original accuracy.
Results
The following were the results observed after testing our algorithm:-
Hey, It's a dog!
And I guess the Breed of this dog is a in/006.American_eskimo_dog
Hey, It's a dog!
And I guess the Breed of this dog is a in/115.Papillon
Hum... seems this neither dog, nor human; must be something else .
Hey, It's a human!
Mmmmmm....If you were a dog, I guess you would be a ... in/110.Norwegian_lundehund!!
Conclusion:
Like wise many more results were obtained accurately and we were able to achieve an Accuracy of 82.66 approx, and successfully check our algorithm functioning.Thus we have learnt to use the CNN ,also to utilize the pretrained neural networks like VGG16 bottleneck_features to build this Dog predictor project.It was a great learning experience ,wish to continue learning and exploring such projects.
Future Improvements/Scope:
Similarly we can Develop a classifier other Animals such as Cats, Sea creatures.And one can just explore more into CNN to develop similar projects and think of application one a more global scale.