Dog Breed Classifier with Keras Walk-through

Introduction

Narendran TS

--

This blog is about Udacity Data Scientist Nanodegree Capstone project -> Dog Breed Classifier. The objective of this project is, if a dog is detected in the image, it will provide an estimate of the dog’s breed. If a human is detected, it will provide an estimate of the dog breed that is most resembling.

We will be using Convolution Neural Network (CNN) since it works very well for images.

Summary of the project:

We will use deep learning algorithm called Convolution neural network to classify dog breeds. We will use open cv2 to detect humans and dogs from the images. Then we will build our own CNN architecture. The results from this CNN model is not convincing as it gives very bad accuracy. So we will use transfer learning concept. That is, we will use pre-trained model. In our case, we will use ResNet50 model. The accuracy which we get here is around 80 percent. In order to test our application, we will also use a sample of six images to test real world scenario.

Interesting and difficult things in the project:

It was amazing to see how CNN algorithm works so well in images. There are 133 categories and CNN does so well in predicting those categories. The most difficult part is building my own CNN algorithm. Neural network has so many parameters that it is difficult to tune them. But thanks to udacity’s sample model, which helped me to build my model with greater than one percent accuracy.

Why CNN works well for images?

CNN works well for images because it works by extracting features from images. This eliminates the need for manual feature extraction. The features are not trained! They’re learned while the network trains on a set of images. A CNN effectively uses adjacent pixel information to effectively downsample the image first by convolution and then uses a prediction layer at the end. This makes CNN models extremely accurate for computer vision tasks. CNNs learn feature detection through tens or hundreds of hidden layers. Each layer increases the complexity of the learned features.

Metrics Used:

For calculating the performance of the model, we will use accuracy. Accuracy improvement is the requirement of the project and also it gives a very clear information about how well we are able to classify dog breeds. For example, for 100 images, if we are able to classify 60 images correctly, then the accuracy is 60%.

Loss Function: The loss function of neural network is categorical crossentropy. It is the most popular loss function used in neural network for multi class classification which uses softmax activation function. The loss function is differentiable. For more information you could refer here.

Sequence of steps that are followed for this project is,

  • Step 0: Import Datasets
  • Step 1: Detect Humans
  • Step 2: Detect Dogs
  • Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
  • Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 6: Write your Algorithm
  • Step 7: Test Your Algorithm

So now let us dive into each of those steps one by one. And before that, the library which we used for building the deep learning model is keras

Code Files:

Code can be downloaded from this github link.

Step 0: Import Data:

The data can be downloaded from following locations,

1. Dog Images

2. Human Images

The above images are of varying sizes. But our code will scale them down to 224 x 224 size

We can read the dog images to Python environment using scikit learn’s load_file function.

from sklearn.datasets import load_files 
from keras.utils import np_utils
import numpy as np
from glob import glob# define function to load train, test, and validation datasets
def load_dataset(path):
data = load_files(path)
dog_files = np.array(data[‘filenames’])
dog_targets = np_utils.to_categorical(np.array(data[‘target’]), 133)
return dog_files, dog_targets# load train, test, and validation datasets
train_files, train_targets = load_dataset(‘../../../data/dog_images/train’)
valid_files, valid_targets = load_dataset(‘../../../data/dog_images/valid’)
test_files, test_targets = load_dataset(‘../../../data/dog_images/test’)# load list of dog names
dog_names = [item[20:-1] for item in sorted(glob(“../../../data/dog_images/train/*/”))]

Once you read all the necessary files, you can check the statistics of the dataset by using following lines of code,

# print statistics about the dataset
print(‘There are %d total dog categories.’ % len(dog_names))
print(‘There are %s total dog images.\n’ % len(np.hstack([train_files, valid_files, test_files])))
print(‘There are %d training dog images.’ % len(train_files))
print(‘There are %d validation dog images.’ % len(valid_files))
print(‘There are %d test dog images.’% len(test_files))

There are total of 133 categories of dog breeds and 8351 images of dogs.

Similarly, we can do the same for humans,

import random
random.seed(8675309)
# load filenames in shuffled human dataset
human_files = np.array(glob(“../../../data/lfw/*/*”))
random.shuffle(human_files)
# print statistics about the dataset
print(‘There are %d total human images.’ % len(human_files))

There are total of 13233 human images.

Step 1: Detect Humans

In the next step we will opencv2, which will help us in detecting humans. Following lines of code,

import cv2 
import matplotlib.pyplot as plt
%matplotlib inline
# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier(‘haarcascade_frontalface_alt.xml’)
# load color (BGR) image
img = cv2.imread(human_files[3])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# find faces in image
faces = face_cascade.detectMultiScale(gray)
# print number of faces detected in the image
print(‘Number of faces detected:’, len(faces))
# get bounding box for each detected face
for (x,y,w,h) in faces:
# add bounding box to color image
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)

# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()

Output of the result is shown below,

Before using any of the face detectors, it is standard procedure to convert the images to grayscale. The detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter.

In the above code, faces is a numpy array of detected faces, where each row corresponds to a detected face. Each detected face is a 1D array with four entries that specifies the bounding box of the detected face. The first two entries in the array (extracted in the above code as x and y) specify the horizontal and vertical positions of the top left corner of the bounding box. The last two entries in the array (extracted here as w and h) specify the width and height of the box.

Then, We can use this procedure to write a function that returns True if a human face is detected in an image and False otherwise. This function, aptly named face_detector, takes a string-valued file path to an image as input and appears in the code block below.

# returns “True” if face is detected in image stored at img_path
def face_detector(img_path):
img = cv2.imread(img_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray)
return len(faces) > 0

For testing, how well haarcascade is able to detect humans correctly, we will take a sample of first 100 images of humans and dogs and then test it,

human_files_short = human_files[:100]
dog_files_short = train_files[:100]
def human_cv_haar_predict(files):
'''Count the number of images with human faces for all images stored at path_array'''
preds = []
for f in files:
if face_detector(f):
preds.append(1)
return len(preds)
print("Number of Human Faces predicted in human_files_short-> ", human_cv_haar_predict(human_files_short))
print("Number of Human Faces predicted in dog_files_short->", human_cv_haar_predict(dog_files_short))

Based on the results, haar cascade was able to detect humans with 100% accuracy. But it was detecting 11 humans in dog images also. So we can conclude that it not a very thorough classifier.

Step 2: Detect Dogs

Now we will build a model for detecting dogs. Here, we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of code downloads the ResNet-50 model, along with weights that have been trained on ImageNet, a very large, very popular dataset used for image classification and other vision tasks. ImageNet contains over 10 million URLs, each linking to an image containing an object from one of 1000 categories. Given an image, this pre-trained ResNet-50 model returns a prediction (derived from the available categories in ImageNet) for the object that is contained in the image.

from keras.applications.resnet50 import ResNet50# define ResNet50 model
ResNet50_model = ResNet50(weights=’imagenet’)

When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll also refer to as a 4D tensor) as input, with shape

(nb_samples,rows,columns,channels),(nb_samples,rows,columns,channels),

where nb_samples corresponds to the total number of images (or samples), and rows, columns, and channels correspond to the number of rows, columns, and channels for each image, respectively.

The path_to_tensor function below takes a string-valued file path to a color image as input and returns a 4D tensor suitable for supplying to a Keras CNN. The function first loads the image and resizes it to a square image that is 224×224 224×224 pixels. Next, the image is converted to an array, which is then resized to a 4D tensor. In this case, since we are working with color images, each image has three channels. Likewise, since we are processing a single image (or sample), the returned tensor will always have shape

(1,224,224,3).(1,224,224,3).

The paths_to_tensor function takes a numpy array of string-valued image paths as input and returns a 4D tensor with shape

(nb_samples,224,224,3).(nb_samples,224,224,3).

Here, nb_samples is the number of samples, or number of images, in the supplied array of image paths.

from keras.preprocessing import image                  
from tqdm import tqdm
def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)

To make Predictions, we need to get the 4D tensor ready for ResNet-50, and for any other pre-trained model in Keras, requires some additional processing. First, the RGB image is converted to BGR by reordering the channels. All pre-trained models have the additional normalization step that the mean pixel (expressed in RGB as [103.939,116.779,123.68][103.939,116.779,123.68] and calculated from all pixels in all images in ImageNet) must be subtracted from every pixel in each image. This is implemented in the imported function preprocess_input.

Now that we have a way to format our image for supplying to ResNet-50, we are now ready to use the model to extract the predictions. This is accomplished with the predict method, which returns an array whose 𝑖i-th entry is the model's predicted probability that the image belongs to the 𝑖i-th ImageNet category. This is implemented in the ResNet50_predict_labels function below.

By taking the argmax of the predicted probability vector, we obtain an integer corresponding to the model’s predicted object class, which we can identify with an object category through the use of this dictionary.

from keras.applications.resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path):
# returns prediction vector for image located at img_path
img = preprocess_input(path_to_tensor(img_path))
return np.argmax(ResNet50_model.predict(img))

While looking at the dictionary, you will notice that the categories corresponding to dogs appear in an uninterrupted sequence and correspond to dictionary keys 151–268, inclusive, to include all categories from 'Chihuahua' to 'Mexican hairless'. Thus, in order to check to see if an image is predicted to contain a dog by the pre-trained ResNet-50 model, we need only check if the ResNet50_predict_labels function above returns a value between 151 and 268 (inclusive).

We use these ideas to complete the dog_detector function below, which returns True if a dog is detected in an image (and False if not).

### returns "True" if a dog is detected in the image stored at img_pathdef dog_detector(img_path):
prediction = ResNet50_predict_labels(img_path)
return ((prediction <= 268) & (prediction >= 151))

Below is the function for calculating the performance of resnet model,

def resnet_predict(files):
'''Count the number of images with dogs for all images stored at files'''
preds = []
for f in files:
if dog_detector(f):
preds.append(1)
return len(preds)

When we check the accuracy, this model is able to detect dogs 100% of time, while it is not detecting dogs in human images.

Step 3: Create a CNN to Classify Dog Breeds

Now that we have functions for detecting humans and dogs in images, we need a way to predict breed from images. In this step, we will create a CNN that classifies dog breeds,

For pre-processing we need to rescale the images by dividing every pixel in every image by 255.

from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True
# pre-process the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

Model Architecture:

  1. The first convolutional layer with 32 filters identifies lower level features such as edges or lines. I used padding=’same’, as it will capture all the features. The images are all 224 by 224 with three channels, so the input_shape is (224, 224, 3).
  2. The second convolutional layer with 128 filters identifies more complex features such as shapes.
  3. The third convolutional layer with 64 filters identifies high level features.
  4. The MaxPooling layer after each converlutional layer reduces the size of the representation by 50% for height and width.
  5. The GlableAveragePooling layer changes the size of height and width to one, then feed it to the last dense layer.
  6. The Dense layer with 133 nodes and a softmax function classifies the image into one of the 133 dog breeds.
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=2, strides=1, padding='same', activation='relu', input_shape=(train_tensors.shape[1], train_tensors.shape[2], train_tensors.shape[3])))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=65, kernel_size=2, strides=1, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=128, kernel_size=2, strides=1, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(GlobalAveragePooling2D())
model.add(Dense(train_targets.shape[1], activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])from keras.callbacks import ModelCheckpointepochs = 5checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5',
verbose=1, save_best_only=True)
model.fit(train_tensors, train_targets,
validation_data=(valid_tensors, valid_targets),
epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Once the model is trained, the best weights based on validation loss can be loaded by,

model.load_weights(‘saved_models/weights.best.from_scratch.hdf5’)

Prediction and accuracy calculation can be done by,

# get index of predicted dog breed for each image in test set
dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]
# report test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

The above CNN model gives a accuracy of 3.3493%.

Step 4: Use a CNN to Classify Dog Breeds

To reduce training time without sacrificing accuracy, we show you how to train a CNN using transfer learning.

We will be using pretrained model VGG16. More details about VGG16 architecture can be found here.

Below is the code for loading vgg16 bottleneck features.

bottleneck_features = np.load('bottleneck_features/DogVGG16Data.npz')
train_VGG16 = bottleneck_features['train']
valid_VGG16 = bottleneck_features['valid']
test_VGG16 = bottleneck_features['test']

Model Architecture:

The model uses the the pre-trained VGG-16 model as a fixed feature extractor, where the last convolutional output of VGG-16 is fed as input to our model. We only add a global average pooling layer and a fully connected layer, where the latter contains one node for each dog category and is equipped with a softmax.

# Model Architecture
VGG16_model = Sequential()
VGG16_model.add(GlobalAveragePooling2D(input_shape=train_VGG16.shape[1:]))
VGG16_model.add(Dense(133, activation='softmax'))
# Compile the model
VGG16_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Save weights based on best validation
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.VGG16.hdf5',
verbose=1, save_best_only=True)
# Training the model
VGG16_model.fit(train_VGG16, train_targets,
validation_data=(valid_VGG16, valid_targets),
epochs=20, batch_size=20, callbacks=[checkpointer], verbose=1)
# Load best weights
VGG16_model.load_weights('saved_models/weights.best.VGG16.hdf5')
# get index of predicted dog breed for each image in test set
VGG16_predictions = [np.argmax(VGG16_model.predict(np.expand_dims(feature, axis=0))) for feature in test_VGG16]
# report test accuracy
test_accuracy = 100*np.sum(np.array(VGG16_predictions)==np.argmax(test_targets, axis=1))/len(VGG16_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

The model is able to classify 49.7608% correctly. It trains much faster than our own CNN model and gives much better results as well. This is why we use transfer learning.

Step 5: Create a CNN to Classify Dog Breeds

The objective here is too build a model which will give minimum of 60 percent accuracy. Here also, we will use transfer learning, based on resnet50 architecture. More details about resent50 can be found here.

Model Architecture:

Since the data set size is small, the objective here is to build a simple CNN transfer learning model in such a way that it doesn’t overfit and then add new fully connected layers to classify the 133 dog breeds. ResNet50 is a very deep convolutional network for large-scale image identification, which has done pretty well in image classification comeptitions. Also the size of the file is smaller than VGG, Inception and Xception model.,This will help us in faster training and better model performance.

First Layer: A global average pooling layer to connect the last convolutional output of ResNet, to reduce variance and extract important features.

Second Layer: A fully connected layer with 128 nodes with relu activiation function to improve accuracy.

Third Layer: A dropout layer with 50% dropout rate, to prevent overfitting of trainind data. As you can see from the results below, accuracies of training, validation and test set are pretty close to each other.

Final layer: Is a fully connected layer with 133 node and a softmax function to classifiy the 133 dog breeds.

Resnet50_model = Sequential()
Resnet50_model.add(GlobalAveragePooling2D(input_shape=train_Resnet50.shape[1:]))
Resnet50_model.add(Dense(128, activation='relu'))
Resnet50_model.add(Dropout(0.5))
Resnet50_model.add(Dense(133, activation='softmax'))
# Compile the model.
Resnet50_model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
# Train the model.
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.Resnet50.hdf5',
verbose=1, save_best_only=True)
Resnet50_model.fit(train_Resnet50, train_targets,
validation_data=(valid_Resnet50, valid_targets),
epochs=20, batch_size=30, callbacks=[checkpointer], verbose=1)
# Load the model weights with the best validation loss.
Resnet50_model.load_weights('saved_models/weights.best.Resnet50.hdf5')
#Calculate classification accuracy on the test dataset.
Resnet50_predictions = [np.argmax(Resnet50_model.predict(np.expand_dims(feature, axis=0))) for feature in test_Resnet50]
# report test accuracy
test_accuracy = 100*np.sum(np.array(Resnet50_predictions)==np.argmax(test_targets, axis=1))/len(Resnet50_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

The model give 83.1340%, which is way better than minimum accuracy requirement of 60 %

Step 6: Write your Algorithm

Now we will write an algorithm that accepts a file path to an image and first determines whether the image contains a human, dog, or neither. Then,

  • if a dog is detected in the image, return the predicted breed.
  • if a human is detected in the image, return the resembling dog breed.
  • if neither is detected in the image, provide output that indicates an error.
from PIL import Imagedef show_result(path, title):   
'''Show the image stored at path with title'''
fig, ax = plt.subplots()
ax.imshow(Image.open(path))
ax.axis('off')
ax.set_title(title)
return ax
def dog_breed_pred(path):
'''Return predicted dog breed or resembling dog breed for image stored at path'''
# Detect dog or human and run prediction
if dog_detector(path):
dog_breed = Resnet50_predict_breed(path)
result = 'This dog looks like a ' + dog_breed + '.'
elif face_detector(path):
resemb_dog_breed = Resnet50_predict_breed(path)
result = 'The most resembling dog breed of this person is ' + resemb_dog_breed + '.'
else:
result = 'There is no human or dog detected in this picture.'
return result
def pred_show_breed(path):
'''Show the image stored at path with the predicted result as title'''
show_result(path, dog_breed_pred(path))

Step 7: Test Your Algorithm

In this step, we will test certain sample images to check how well the algorithm works.

You can run the code by calling the function in following way,

pred_show_breed(‘.../.../....jpg’)

Let us look at some of the results,

As you can see from the above results our model seems to be working very well.

Steps to improve our model

  1. Fine tune the model to give a better accuracy
  2. Ensemble many models, and give the combined predictions such as taking the mean of probabilities of entire model.
  3. Increase the training data size. Collect more data as possible
  4. By transforming the image, such as flipping, rotating images etc.,
  5. Train images with cases where there are multiple dogs, dogs and human etc.,
  6. Build object detection model. That is, first identify the object and then classify, could give better results.

Thanks you For Reading !!

--

--