Classification of Dog Breed Using Deep Learning

Sushant Agarwal

Published in

Analytics Vidhya

8 min readMay 24, 2020

Overview

This is a project that aims to detect dog breed based on the photo of the dog.

In this real-world setting, we will need to piece together a series of models to perform different tasks; for instance, the algorithm that detects humans in an image will be different from the CNN that infers dog breed. There are many points of possible failure, and no perfect algorithm exists. Our imperfect solution will nonetheless create a fun user experience!

So the journey begins…..

Steps to be followed for the

Find the right dataset
Import the dataset and preprocess it(if required)
Detect Humans
Detect Dogs
Create a CNN model from scratch of our own to classify Dog Breed
Create a CNN model by using Transfer Learning to classify Dog Breed
Test both the models

In this project we are going to use PyTorch to creat our classifier

PyTorch is an open-source machine learning library for Python, based on Torch, used for applications such as natural language processing. It is primarily developed by Facebook’s artificial-intelligence research group, and Uber’s “Pyro” Probabilistic programming language software is built on it.

Importing Data

Make sure that you’ve downloaded the required human and dog datasets:

Download the dog dataset. Unzip the folder and place it in this project’s home directory, at the location /dog_images.
Download the human dataset. Unzip the folder and place it in the home directory, at location /lfw.

import numpy as np
from glob import glob

# load filenames for human and dog images
human_files = np.array(glob("/data/lfw/*/*"))
dog_files = np.array(glob("/data/dog_images/*/*/*"))

# print number of images in each dataset
print('There are %d total human images.' % len(human_files))
print('There are %d total dog images.' % len(dog_files))
print(dog_files)

Output of the print function that describes the data in dog_files

Detecting Humans

In this section, we use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images.

import cv2                
import matplotlib.pyplot as plt                        
%matplotlib inline                               

# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')

# load color (BGR) image
img = cv2.imread(human_files[0])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# find faces in image
faces = face_cascade.detectMultiScale(gray)

# print number of faces detected in the image
print('Number of faces detected:', len(faces))

# get bounding box for each detected face
for (x,y,w,h) in faces:
    # add bounding box to color image
    cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
    
# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()

In the above code we are using the detectMultiScale function executes the classifier stored in face_cascade and takes the grayscale image as a parameter. The result of the function is shown above on a single input.

Detecting Dogs

This consists of the use a pre-trained model to detect dogs in images. We are going to use VGG-16 Model initally.

Downloading the model

import torch
import torchvision.models as models

# define VGG16 model
VGG16 = models.vgg16(pretrained=True)

# check if CUDA is available
use_cuda = torch.cuda.is_available()
print("cuda available? {0}".format(use_cuda))
# move model to GPU if CUDA is available
if use_cuda:
    VGG16 = VGG16.cuda()
# this function will automatically check for the availaiblity of GPU in the system and use the required model.

Implementation

from PIL import Image
import torchvision.transforms as transforms


def load_image(img_path):    
    image = Image.open(img_path).convert('RGB')
    # resize to (244, 244) because VGG16 accept this shape
    in_transform = transforms.Compose([
                        transforms.Resize(size=(244, 244)),
                        transforms.ToTensor()]) # normalizaiton parameters from pytorch doc.

    # discard the transparent, alpha channel (that's the :3) and add the batch dimension
    image = in_transform(image)[:3,:,:].unsqueeze(0)
    return image

def VGG16_predict(img_path):
    '''
    Use pre-trained VGG-16 model to obtain index corresponding to 
    predicted ImageNet class for image at specified path
    
    Args:
        img_path: path to an image
        
    Returns:
        Index corresponding to VGG-16 model's prediction
    '''
    
    
    ## Load and pre-process an image from the given img_path
    ## Return the *index* of the predicted class for that image
    img = load_image(img_path)
    if use_cuda:
        img = img.cuda()
    ret = VGG16(img)
    return torch.max(ret,1)[1].item()
   # predicted class index

In the ablve code we are loading the Image using load_image and this function only returns the image with RGB cannnel only. The other function VGG16_predict Use pre-trained VGG-16 model to obtain index corresponding to
predicted ImageNet class.

Implementation function

def dog_detector(img_path):
    
    idx = VGG16_predict(img_path)
    return idx >= 151 and idx <= 268
    return None # true/falseprint(dog_detector(dog_files_short[0]))
print(dog_detector(human_files_short[0]))>> True
False

Assess the Dog Detector

def dog_detector_test(files):
    detection_cnt = 0;
    total_cnt = len(files)
    for file in files:
        detection_cnt += dog_detector(file)
    return detection_cnt, total_cnt
print("detect a dog in human_files: {} / {}".format(dog_detector_test(human_files_short)[0], dog_detector_test(human_files_short)[1]))
print("detect a dog in dog_files: {} / {}".format(dog_detector_test(dog_files_short)[0], dog_detector_test(dog_files_short)[1]))

We can use any ohter model instead of VGG16. There are multiple pre-trained model available in the repo. Some examples are — Inception-v3 and ResNet-50.

CNN Model to predict Dog Breed From Scratch

This is a very interesting exercise to do and although it is not used in the final web application it is very useful for understanding CNNs and how they work. We will be utilising transfer learning for the final image detector.

Loading the data

import os
from torchvision import datasets


## Specify appropriate transforms, and batch_sizes
import torchvision.transforms as transforms
import torch
import numpy as np
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True


## Specify appropriate transforms, and batch_sizes

batch_size = 20
num_workers = 0

data_dir = '/data/dog_images'
train_dir = os.path.join(data_dir, 'train/')
valid_dir = os.path.join(data_dir, 'valid/')
test_dir = os.path.join(data_dir, 'test/')#defining the data loaders
# the data is splittedd between training, testing and validation setstrain_data = datasets.ImageFolder(train_dir, transform=data_transforms['train'])
valid_data = datasets.ImageFolder(valid_dir, transform=data_transforms['val'])
test_data = datasets.ImageFolder(test_dir, transform=data_transforms['test'])
# data loader

train_loader = torch.utils.data.DataLoader(train_data,
                                           batch_size=batch_size, 
                                           num_workers=num_workers,
                                           shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_data,
                                           batch_size=batch_size, 
                                           num_workers=num_workers,
                                           shuffle=False)
test_loader = torch.utils.data.DataLoader(test_data,
                                           batch_size=batch_size, 
                                           num_workers=num_workers,
                                           shuffle=False)
loaders_scratch = {
    'train': train_loader,
    'valid': valid_loader,
    'test': test_loader
}

Defining the Model Architecture(for Scratch)

from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

num_classes = 133

import torch.nn as nn
import torch.nn.functional as F

# define the CNN architecture
class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        ## Define layers of a CNN
        self.conv1 = nn.Conv2d(3, 32, 3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 64, 3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)

        # pool
        self.pool = nn.MaxPool2d(2, 2)
        
        # fully-connected
        self.fc1 = nn.Linear(7*7*128, 500)
        self.fc2 = nn.Linear(500, num_classes) 
        
        # drop-out
        self.dropout = nn.Dropout(0.3)
    def forward(self, x):
        ## Define forward behavior
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = F.relu(self.conv3(x))
        x = self.pool(x)
        
        # flatten
        x = x.view(-1, 7*7*128)
        
        x = self.dropout(x)
        x = F.relu(self.fc1(x))
        
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# instantiate the CNN
model_scratch = Net()
print(model_scratch)# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()

Model architecture of our model from scratch

The lossfunction used here is Cross Entryopy Loss

#implementing the loss function
criterion_scratch = nn.CrossEntropyLoss()
optimizer_scratch = optim.SGD(model_scratch.parameters(), lr = 0.05)

Function implementation to train and validate the model

def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    model.load_state_dict(torch.load('model_scratch.pt'))
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            # initialize weights to zero
            optimizer.zero_grad()
            
            output = model(data)
            
            # calculate loss
            loss = criterion(output, target)
            
            # back prop
            loss.backward()
            
            # grad
            optimizer.step()
            
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
            
            if batch_idx % 100 == 0:
                print('Epoch %d, Batch %d loss: %.6f' %
                  (epoch, batch_idx + 1, train_loss))
        ######################    
        # validate the model #
        ######################
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            ## update the average validation loss
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
            
        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))
        
        ## TODO: save the model if validation loss has decreased
        if valid_loss < valid_loss_min:
            torch.save(model.state_dict(), save_path)
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            valid_loss_min = valid_loss
              
    # return trained model
    return model


# train the model
model_scratch = train(80, loaders_scratch, model_scratch, optimizer_scratch, 
                      criterion_scratch, use_cuda, 'model_scratch.pt')
model_scratch.load_state_dict(torch.load('model_scratch.pt'))

This model is trained for 80 epochs on GPU instance

On completion of training it gives a accuracy of 92%

The same is implemented for the Transfer learning model but in that case case a very less number of epochs are required for the model to train and produce promising results.

Conclusion

During the training of the model with randomly initalized weights the training takes a considerable number of epochs to reduce the loss and improve the accuracy.
Using the transfer learning techniques we can train the model faster as the model is initally trained for certain datasets and now it only requires a fine-tuning.
We do not require deep networks for classification proposes in initial phases.
In future, I will try to implement this model using pruning techniques and reduce themodel size and training time. Thus enabling its implementation on edge devices.
We could reduce noise in the training data by randomly checking the training images that were badly focused or with more than one breed of dog.