Dog Breed Classifier — Power of Neural Networks

Published in

Analytics Vidhya

9 min readMay 5, 2020

How interesting it is to classify not only dog but human images too as a dog breed!

It is unbelievable how the edges and nodes shown above, create set of strong algorithms that are capable of recognizing patterns beautifully. These are modeled loosely after the human brain that can interpret sensory data through a kind of machine perception classification. It takes images, sound, text, time series etc. all real-world data and converts it to numerical vectors to recognize patterns.

In this post, we are going to focus more on Convolutional Neural Networks (ConvNets or CNNs) that is a category of Deep Neural Networks and have proven to be very effective in areas for analyzing visual imagery especially image recognition and classification. To understand this further, we will go through a Convolutional Neural Network that can be used to process real-world, user-supplied images.

Given an image of a dog, the algorithm will identify the potential dog breed. If supplied with a human image, the code will identify the resembling dog breed.

Sounds interesting isn’t ? Let’s dive deeper…

It is very simple 7-step process unlike the complicated neural network process that runs behind the scene:

Step 1: Import Datasets
Step 2: Detect Humans
Step 3: Detect Dogs
Step 4: Create a CNN to Classify Dog Breeds
Step 5: Use pre-built Keras CNN models and modify those to Classify Dog Breeds (using Transfer Learning)
Step 6: Write an Algorithm to bind the steps above
Step 7: Test the Algorithm

Step 1: Import Datasets

We start with importing the required libraries and the data set from Udacity. The data set has ~8,500 dog images across 133 dog breeds. Once imported, we split the data into train, validation, and test data sets with a distribution of 80%, 10%, 10% respectively, and store the features and targets for each.

from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob# define function to load train, test, and validation datasets
def load_dataset(path):
    data = load_files(path)
    dog_files = np.array(data['filenames'])
    dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
    return dog_files, dog_targets# load train, test, and validation datasets
train_files, train_targets = load_dataset('../../../data/dog_images/train')
valid_files, valid_targets = load_dataset('../../../data/dog_images/valid')
test_files, test_targets = load_dataset('../../../data/dog_images/test')# load list of dog names
dog_names = [item[20:-1] for item in sorted(glob("../../../data/dog_images/train/*/"))]

Step 2: Detect Humans

In this step, we will write a function to detect if the image has a human face or not. To be able to design this function, we first import a dataset of human images from Udacity.

import random
random.seed(8675309)# load filenames in shuffled human dataset
human_files = np.array(glob("../../../data/lfw/*/*"))
random.shuffle(human_files)# print statistics about the dataset
print('There are %d total human images.' % len(human_files))

Now, to detect a human face, we use OpenCV’s implementation of Haar feature-based cascade classifiers. OpenCV provides many pre-trained face detectors, stored as XML files on Github and we downloaded one of these detectors. As we see below, there are a few additional, image conversion steps, that are required by the algorithm before a face could be detected. The face_detector function below takes image path as a parameter and returns True if a human face is detected and False otherwise.

import cv2                
import matplotlib.pyplot as plt                        
%matplotlib inline# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')# returns "True" if face is detected in image stored at img_path
def face_detector(img_path):
    img = cv2.imread(img_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray)
    return len(faces) > 0

Next, we take sample of 100 images from the human and dog data we downloaded earlier and run those through the function. Our model was able to detect a human face in 100% of human images and 11% of dog images.
Well thats great, but is it enough?

Step 3: Detect Dogs

In this step, we use a pre-trained ResNet-50 model to detect dogs in images. We start with downloading the ResNet-50 model along with weights that have been trained on ImageNet, a very large, and a popular dataset used for image classification and other vision tasks.

from keras.applications.resnet50 import ResNet50# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

Given how messy real world images could be, it normally requires some pre-processing before feeding into model. The path_to_tensor function below first resizes all images to a square that is 224×224 pixels (one of the key steps). Next, the image is converted to a 4D array (aka 4D tensor) since Keras CNN uses TensorFlow as the backend here. Tensor is a generalization of matrices to N-dimensional space. For more details, look at this great post by Matthew Mayo.

Input shape: (nb_samples, rows, columns, channels) where,
nb_samples is the total number of images (or samples), and
rows, columns, and channels corresponds to the height, length, and depth of each image, respectively. Our 4D tensor would be (1, 224, 224, 3).

from keras.preprocessing import image                  
from tqdm import tqdmdef path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(224, 224))
    # convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)def paths_to_tensor(img_paths):
    list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
    return np.vstack(list_of_tensors)

There are a few additional preprocessing steps required before the model can be used for prediction like convert the RGB image to BGR, normalize models by subtracting the mean pixel from every pixel in each image etc. This is all taken care of by the preprocess_input function. To know more about it, check out code here.

Now that our image is formatted, we are ready to supply it to ResNet-50 and make predictions. The predict function below returns the predicted probability of the image belonging to a specific ImageNet category. For mapping the returned integers to the model’s predicted object class, please use this dictionary.

from keras.applications.resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path):
    # returns prediction vector for image located at img_path
    img = preprocess_input(path_to_tensor(img_path))
    return np.argmax(ResNet50_model.predict(img))

All the dog categories in the dictionary correspond to keys 151–268. Therefore, to detect a dog face, we need to check if ResNet50_predict_labels function below returns a value between 151 and 268 (inclusive). The function returns a True if a dog face is detected, and False otherwise.

Similar to step 2, we run a sample of 100 images through this function. Our model was able to detect a dog face in 100% of dog images and 0% in human images.

Step 4: Create a CNN to Classify Dog Breeds

Now that we are able to detect a human and a dog face in the images, our next goal is to classify the dog breed. We created a CNN model to help with these classification predictions. Before we start building the model, we rescaled the images by dividing every pixel in every image by 255.

# pre-process the data for Keras
train_tensors = paths_to_tensor(train_files).astype('float32')/255
valid_tensors = paths_to_tensor(valid_files).astype('float32')/255
test_tensors = paths_to_tensor(test_files).astype('float32')/255

Below is the CNN architecture from our classification model. CNNs are often designed with a goal of making the input arrays much deeper than its length or width. In the 3 Convolutional layers used below, I am increasing the number of filters to increase the stack of features, and thus increase its depth. Each convolutional layer is followed by a MaxPooling layer, to reduce the spatial dimensionality of the image. We then flatten the matrices into a vector, before we feed it into a fully connected Dense layer since these do not accept multidimensional arrays. This layer uses a softmax activation function to get the classification probability of each category, and 133 output nodes, 1 for each dog category in our training data. To go into detail of each parameter, and how these are calculate, please see this post by Rakshith Vasudev.

from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential# Define your architecture.model = Sequential()
# Convolutional layers and maxpooling layers, note: all images are 224*224 pixel
model.add(Conv2D(filters=16, kernel_size=2, strides=1, padding='same',activation='relu', input_shape=[224,224,3]))
model.add(MaxPooling2D(pool_size=2, strides=1, padding='same'))model.add(Conv2D(filters=32, kernel_size=2, strides=2, padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=2, strides=1, padding='same'))model.add(Conv2D(filters=64, kernel_size=2, strides=2, padding='same',activation='relu'))
model.add(MaxPooling2D(pool_size=2, strides=1, padding='same'))
#model.add(GlobalAveragePooling2D())# Flatten the array into a vector and feed to a dense layer
model.add(Flatten())
model.add(Dense(133, activation='softmax'))model.summary()

Next, we compile, and train our model. ModelCheckpoint is used to save the model that attains the best validation loss.

from keras.callbacks import ModelCheckpointmodel.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.hdf5', 
                               verbose=1, save_best_only=True)model.fit(train_tensors, train_targets, 
          validation_data=(valid_tensors, valid_targets),
          epochs=epochs, batch_size=20, callbacks=[checkpointer], verbose=1)

Once our model is trained, we load the weights that were saved earlier, and use it to run the model on our test data to evaluate the accuracy of our model.

model.load_weights('saved_models/weights.best.from_scratch.hdf5')# get index of predicted dog breed for each image in test set
dog_breed_predictions = [np.argmax(model.predict(np.expand_dims(tensor, axis=0))) for tensor in test_tensors]# report test accuracy
test_accuracy = 100*np.sum(np.array(dog_breed_predictions)==np.argmax(test_targets, axis=1))/len(dog_breed_predictions)
print('Test accuracy: %.4f%%' % test_accuracy)

This model works decent and will give an accuracy of ~7%. You must be wondering, so much to get such low accuracy? Remember, this is without any parameter fine-tuning and data augmentation. This is where the next steps will help improve accuracy.

Step 5: Use pre-built Keras CNN models and modify those to Classify Dog Breeds (using Transfer Learning)

There are some groundbreaking pre-built CNN architectures, available in Keras that could be used through transfer learning viz. VGG16, VGG19, ResNet50, Xception, InceptionV3. These models help reduce training time without sacrificing accuracy. Here the pre-trained VGG-16 model has been used and is fed into our model. We only added a global average pooling layer (to reduce dimensionality) and a fully connected layer with a softmax activation function (to get one node for each dog category).

bottleneck_features = np.load('bottleneck_features/DogVGG16Data.npz')
train_VGG16 = bottleneck_features['train']
valid_VGG16 = bottleneck_features['valid']
test_VGG16 = bottleneck_features['test']# CNN architecture using Transfer Learning
VGG16_model = Sequential()
VGG16_model.add(GlobalAveragePooling2D(input_shape=train_VGG16.shape[1:]))
VGG16_model.add(Dense(133, activation='softmax'))VGG16_model.summary()

When we ran our test data through this newly trained model and pre-computed features, our accuracy increased to ~45% in less duration which is a significant improvement. This is because now there are only 2 layers in the network that are being processed. The accuracy further jumped to ~82% with ResNet50 model, which is what I ended up using in my code.

Step 6: Write an Algorithm to bind the steps above

This is the step where we put all the different pieces together. We write a simple algorithm that accepts an image path and first determines whether it contains a face of human, dog, or neither. Then,

if a dog is detected in the image, return the predicted breed.
if a human is detected in the image, return the resembling dog breed.
if neither is detected in the image, provide output that indicates an error.

def display_detect_image(img_path):
    detect_breed(img_path)
    # load color (BGR) image
    img = cv2.imread(img_path)
    # convert BGR image to RGB for plotting
    cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # display the image
    plt.imshow(cv_rgb)
    return plt.show()def detect_breed(img_path):
    # check if image is human face
    if face_detector(img_path) == True:
        return print("Hello human! Your face resembles a: ",Resnet50_predict_breed(img_path).str.split(".")[-1])
    # check if image is dog face
    elif dog_detector(img_path) == True:
        return print("Hello dog! Your predicted breed is: ",Resnet50_predict_breed(img_path).str.split(".")[-1])
    # else print an error message
    else:
        return print("Oops! This is neither a human nor a dog")

Step 7: Test our Algorithm

This is the section where we see the power of our algorithm and take it for a spin! I supplied some random dog, and human images and voila! the algorithm predicts the breed. Now, if you like a dog on a street or in a park, and you want to know its breed, no need to ask the owner, just click a picture and run it through the model 😄

To access the entire code, see the link to my GitHub available here.