What dog breed am I?

Dog Breed Classification of human or dog images using Deep Learning

Hema Reddy
Analytics Vidhya
6 min readMar 24, 2021

--

Problem Statement

The idea of this project is two fold.

  • One is to identify if there are dogs in an image passed to the algorithm and classify the species of the dog.
  • Second is if there are no dogs but there are humans in the image then the algorithm has to classify the human’s face to the closest possible species available in the target labels.

Strategy Implemented

The neural network model is trained using CNN. Convolutional Neural Network (CNN) is a neural network that is widely used to recognize and classify images. We implement two architectures using CNNs, one is a naive convolutional network with 3 conv2d layers and the other architecture is resnet50 with the last layers(transfer learning) modified to suit this problem.

Performance of the model with right image as input and one of the examples of the labels that the model classifies as. Right image source unsplash.com

Data

This project is for my Udacity data scientist nano degree. The data is provided to us by Udacity. Data contains

133 total dog categories.
8351 total dog images.

6680 training dog images.
835 validation dog images.
836 test dog images.

We also have human images for face detection.

13233 total human images.

Modelling

Face Detection

For face detection we will use haar cascades by OpenCV.

face_cascade = cv2.CascadeClassifier(‘haarcascades/haarcascade_frontalface_alt.xml’)

We create a function for face detection for using it in different parts of our code. This function takes in a image path and checks if the image has one or more faces and returns True or False.

def face_detector(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
return len(faces) > 0

Dog Detection

Now that we finished face detection we will want to see if our image has any dogs. For detecting dogs, we will use a pretrained resnet50.

def ResNet50_predict_labels(img_path):
# returns prediction vector for image located at img_path
img = preprocess_input(path_to_tensor(img_path))
return np.argmax(ResNet50_model.predict(img))
def dog_detector(img_path):
prediction = ResNet50_predict_labels(img_path)
return ((prediction <= 268) & (prediction >= 151))

The dog detector function takes an image and calls the ResNet50_predict_labels function which uses a resnet50 model to predict the label of the image. If the image contains a dog the label returned will have an index between 151 and 268(both inclusive). You can check out the indices for all the labels that resnet50 can predict here.

We can now detect both humans and dogs from images. The next part is training a deep learning model for classifying dog breeds. But before we do that we will have to make sure all the images we have are consistent across all pixels. So we rescale all these images by dividing them with 255

train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

Model Architecture

We create two models. One with a basic naive architecture from scratch and the other we use a pretrained resnet50 model with a few layers added to it to suit our problem.

Naive Architecture

Let’s first take a look at the naive architecture.

model = Sequential(
[

Conv2D(16,(2,2), activation=’relu’, input_shape=(224,224,3)),
Dropout(0.3),
MaxPooling2D((2,2)),
Conv2D(32,(2,2), activation=’relu’),
Dropout(0.3),
MaxPooling2D((2,2)),
Conv2D(64,(2,2), activation=’relu’),
Dropout(0.3),
MaxPooling2D((2,2)),
Flatten(),
Dense(133, activation=’relu’)

])

We have 2 dimensional convolutional layers, dropout and maxpooling layers. Towards the end we have a flatten layer before we apply dense layer to get the output label for the images.

Resnet50 Architecture

We will show how the Naive architecture performed in the testing section below. But before that we will take a look at another model that we will train which will perform much better because of its complex structure and the number of layers it has. Which may not always be true but in this case it will perform much better. Let’s take a look at what we can add to the resnet 50 model to customize it for our problem.

We first load the bottleneck features that are already compiled for this particular data. You can download the .npz file shown below here.

bottleneck_features = np.load('bottleneck_features/DogResnet50Data.npz')
train_Resnet50 = bottleneck_features['train']
valid_Resnet50 = bottleneck_features['valid']
test_Resnet50 = bottleneck_features['test']

To define the layers we want to add at the end of this model we will create a sequential model.

Resnet50_model = Sequential()
Resnet50_model.add(GlobalAveragePooling2D(input_shape=train_Resnet50.shape[1:]))
Resnet50_model.add(Dense(133, activation='softmax'))

And the we compile and fit both our architectures. Let’s take a look at their individual performances.

Results

Naive Architecture

We train the naive architecture for 5 epochs. And below is the training details of the model.

Train on 6680 samples, validate on 835 samples
Epoch 1/5
6660/6680 [============================>.] - ETA: 0s - loss: 15.4370 - acc: 0.0104Epoch 00001: val_loss improved from inf to 15.75039, saving model to saved_models/weights.best.from_scratch.hdf5
6680/6680 [==============================] - 30s 5ms/step - loss: 15.4390 - acc: 0.0103 - val_loss: 15.7504 - val_acc: 0.0132
Epoch 2/5
6660/6680 [============================>.] - ETA: 0s - loss: 15.5520 - acc: 0.0120Epoch 00002: val_loss did not improve
6680/6680 [==============================] - 29s 4ms/step - loss: 15.5514 - acc: 0.0121 - val_loss: 15.7563 - val_acc: 0.0084
Epoch 3/5
6660/6680 [============================>.] - ETA: 0s - loss: 15.2091 - acc: 0.0143Epoch 00003: val_loss improved from 15.75039 to 15.02195, saving model to saved_models/weights.best.from_scratch.hdf5
6680/6680 [==============================] - 29s 4ms/step - loss: 15.2070 - acc: 0.0142 - val_loss: 15.0220 - val_acc: 0.0144
Epoch 4/5
6660/6680 [============================>.] - ETA: 0s - loss: 15.5074 - acc: 0.0078Epoch 00004: val_loss did not improve
6680/6680 [==============================] - 31s 5ms/step - loss: 15.5093 - acc: 0.0078 - val_loss: 16.0409 - val_acc: 0.0048
Epoch 5/5
6660/6680 [============================>.] - ETA: 0s - loss: 16.0358 - acc: 0.0051Epoch 00005: val_loss did not improve
6680/6680 [==============================] - 32s 5ms/step - loss: 16.0361 - acc: 0.0051 - val_loss: 16.0409 - val_acc: 0.0048

When tested with previously unseen data the naive architecture results in 1.4% test accuracy. Which is to be honest unusable. Let’s take a look at how the resnet50 model performs.

Resnet50 Architecture

We trained this model for 20 epochs. And below is the training output. The performance of this model on test data is considerable higher compared to the previous model. We were able to see 81.3% accuracy on previously unseen data which is quite a leap.

(Below training process shows first 5 and last 5 epochs to not let this article run too long.)

Train on 6680 samples, validate on 835 samples
Epoch 1/20
6540/6680 [============================>.] - ETA: 0s - loss: 1.6424 - acc: 0.6009Epoch 00001: val_loss improved from inf to 0.77969, saving model to saved_models/weights.best.Resnet50.hdf5
6680/6680 [==============================] - 2s 270us/step - loss: 1.6224 - acc: 0.6049 - val_loss: 0.7797 - val_acc: 0.7593
Epoch 2/20
6560/6680 [============================>.] - ETA: 0s - loss: 0.4371 - acc: 0.8623Epoch 00002: val_loss improved from 0.77969 to 0.66353, saving model to saved_models/weights.best.Resnet50.hdf5
6680/6680 [==============================] - 1s 224us/step - loss: 0.4381 - acc: 0.8620 - val_loss: 0.6635 - val_acc: 0.8036
Epoch 3/20
6460/6680 [============================>.] - ETA: 0s - loss: 0.2664 - acc: 0.9150Epoch 00003: val_loss did not improve
6680/6680 [==============================] - 1s 224us/step - loss: 0.2628 - acc: 0.9160 - val_loss: 0.6948 - val_acc: 0.7820
Epoch 4/20
6540/6680 [============================>.] - ETA: 0s - loss: 0.1775 - acc: 0.9462Epoch 00004: val_loss improved from 0.66353 to 0.64685, saving model to saved_models/weights.best.Resnet50.hdf5
6680/6680 [==============================] - 1s 224us/step - loss: 0.1784 - acc: 0.9460 - val_loss: 0.6468 - val_acc: 0.8156
Epoch 5/20
6440/6680 [===========================>..] - ETA: 0s - loss: 0.1243 - acc: 0.9632Epoch 00005: val_loss did not improve
6680/6680 [==============================] - 1s 223us/step - loss: 0.1243 - acc: 0.9623 - val_loss: 0.6880 - val_acc: 0.8120
...Epoch 16/20
6620/6680 [============================>.] - ETA: 0s - loss: 0.0092 - acc: 0.9980Epoch 00016: val_loss did not improve
6680/6680 [==============================] - 1s 222us/step - loss: 0.0091 - acc: 0.9981 - val_loss: 0.8508 - val_acc: 0.8228
Epoch 17/20
6620/6680 [============================>.] - ETA: 0s - loss: 0.0071 - acc: 0.9982Epoch 00017: val_loss did not improve
6680/6680 [==============================] - 1s 222us/step - loss: 0.0070 - acc: 0.9982 - val_loss: 0.8501 - val_acc: 0.8359
Epoch 18/20
6500/6680 [============================>.] - ETA: 0s - loss: 0.0066 - acc: 0.9980Epoch 00018: val_loss did not improve
6680/6680 [==============================] - 2s 225us/step - loss: 0.0065 - acc: 0.9981 - val_loss: 0.8925 - val_acc: 0.8240
Epoch 19/20
6500/6680 [============================>.] - ETA: 0s - loss: 0.0060 - acc: 0.9983Epoch 00019: val_loss did not improve
6680/6680 [==============================] - 2s 225us/step - loss: 0.0059 - acc: 0.9984 - val_loss: 0.9231 - val_acc: 0.8299
Epoch 20/20
6500/6680 [============================>.] - ETA: 0s - loss: 0.0064 - acc: 0.9980Epoch 00020: val_loss did not improve
6680/6680 [==============================] - 2s 225us/step - loss: 0.0063 - acc: 0.9981 - val_loss: 0.9584 - val_acc: 0.8216

Improvements

Improvements on the model can make this achieve even more significant results. Some of them can be using

  1. Data augmentation
  2. Dog specific features
  3. Balanced dataset
  4. Training with different number of epochs.

I might also create a web app out of this just for fun. So look forward to it too :)

Conclusion

This was a fun project to work on and I thank Udacity for getting me this far. To keep the size of the blog to a minimum I added the most important code parts that will help for a better reading experience. However if you wish to play with the code yourself. You can find the notebook in my Github profile. If you have any comments or questions I would always love to read and answer so do let me know your thoughts.

P.S. The image above is the actual classification. The person on the right was classified as a Papillon, so I added a cute example. What do you guys think about the classification?

--

--