Classification of Dog Breed Using Deep Learning
Overview
This is a project that aims to detect dog breed based on the photo of the dog.
In this real-world setting, we will need to piece together a series of models to perform different tasks; for instance, the algorithm that detects humans in an image will be different from the CNN that infers dog breed. There are many points of possible failure, and no perfect algorithm exists. Our imperfect solution will nonetheless create a fun user experience!
So the journey begins…..
Steps to be followed for the
- Find the right dataset
- Import the dataset and preprocess it(if required)
- Detect Humans
- Detect Dogs
- Create a CNN model from scratch of our own to classify Dog Breed
- Create a CNN model by using Transfer Learning to classify Dog Breed
- Test both the models
In this project we are going to use PyTorch to creat our classifier
PyTorch is an open-source machine learning library for Python, based on Torch, used for applications such as natural language processing. It is primarily developed by Facebook’s artificial-intelligence research group, and Uber’s “Pyro” Probabilistic programming language software is built on it.
Importing Data
Make sure that you’ve downloaded the required human and dog datasets:
- Download the dog dataset. Unzip the folder and place it in this project’s home directory, at the location
/dog_images
. - Download the human dataset. Unzip the folder and place it in the home directory, at location
/lfw
.
import numpy as np
from glob import glob
# load filenames for human and dog images
human_files = np.array(glob("/data/lfw/*/*"))
dog_files = np.array(glob("/data/dog_images/*/*/*"))
# print number of images in each dataset
print('There are %d total human images.' % len(human_files))
print('There are %d total dog images.' % len(dog_files))
print(dog_files)
Detecting Humans
In this section, we use OpenCV’s implementation of Haar feature-based cascade classifiers to detect human faces in images.
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')
# load color (BGR) image
img = cv2.imread(human_files[0])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# find faces in image
faces = face_cascade.detectMultiScale(gray)
# print number of faces detected in the image
print('Number of faces detected:', len(faces))
# get bounding box for each detected face
for (x,y,w,h) in faces:
# add bounding box to color image
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# display the image, along with bounding box
plt.imshow(cv_rgb)
plt.show()
In the above code we are using the detectMultiScale
function executes the classifier stored in face_cascade
and takes the grayscale image as a parameter. The result of the function is shown above on a single input.
Detecting Dogs
This consists of the use a pre-trained model to detect dogs in images. We are going to use VGG-16 Model initally.
Downloading the model
import torch
import torchvision.models as models
# define VGG16 model
VGG16 = models.vgg16(pretrained=True)
# check if CUDA is available
use_cuda = torch.cuda.is_available()
print("cuda available? {0}".format(use_cuda))
# move model to GPU if CUDA is available
if use_cuda:
VGG16 = VGG16.cuda()
# this function will automatically check for the availaiblity of GPU in the system and use the required model.
Implementation
from PIL import Image
import torchvision.transforms as transforms
def load_image(img_path):
image = Image.open(img_path).convert('RGB')
# resize to (244, 244) because VGG16 accept this shape
in_transform = transforms.Compose([
transforms.Resize(size=(244, 244)),
transforms.ToTensor()]) # normalizaiton parameters from pytorch doc.
# discard the transparent, alpha channel (that's the :3) and add the batch dimension
image = in_transform(image)[:3,:,:].unsqueeze(0)
return image
def VGG16_predict(img_path):
'''
Use pre-trained VGG-16 model to obtain index corresponding to
predicted ImageNet class for image at specified path
Args:
img_path: path to an image
Returns:
Index corresponding to VGG-16 model's prediction
'''
## Load and pre-process an image from the given img_path
## Return the *index* of the predicted class for that image
img = load_image(img_path)
if use_cuda:
img = img.cuda()
ret = VGG16(img)
return torch.max(ret,1)[1].item()
# predicted class index
In the ablve code we are loading the Image using load_image
and this function only returns the image with RGB cannnel only. The other function VGG16_predict
Use pre-trained VGG-16 model to obtain index corresponding to
predicted ImageNet class.
Implementation function
def dog_detector(img_path):
idx = VGG16_predict(img_path)
return idx >= 151 and idx <= 268
return None # true/falseprint(dog_detector(dog_files_short[0]))
print(dog_detector(human_files_short[0]))>> True
False
Assess the Dog Detector
def dog_detector_test(files):
detection_cnt = 0;
total_cnt = len(files)
for file in files:
detection_cnt += dog_detector(file)
return detection_cnt, total_cnt
print("detect a dog in human_files: {} / {}".format(dog_detector_test(human_files_short)[0], dog_detector_test(human_files_short)[1]))
print("detect a dog in dog_files: {} / {}".format(dog_detector_test(dog_files_short)[0], dog_detector_test(dog_files_short)[1]))
We can use any ohter model instead of VGG16. There are multiple pre-trained model available in the repo. Some examples are — Inception-v3 and ResNet-50.
CNN Model to predict Dog Breed From Scratch
This is a very interesting exercise to do and although it is not used in the final web application it is very useful for understanding CNNs and how they work. We will be utilising transfer learning for the final image detector.
Loading the data
import os
from torchvision import datasets
## Specify appropriate transforms, and batch_sizes
import torchvision.transforms as transforms
import torch
import numpy as np
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
## Specify appropriate transforms, and batch_sizes
batch_size = 20
num_workers = 0
data_dir = '/data/dog_images'
train_dir = os.path.join(data_dir, 'train/')
valid_dir = os.path.join(data_dir, 'valid/')
test_dir = os.path.join(data_dir, 'test/')#defining the data loaders
# the data is splittedd between training, testing and validation setstrain_data = datasets.ImageFolder(train_dir, transform=data_transforms['train'])
valid_data = datasets.ImageFolder(valid_dir, transform=data_transforms['val'])
test_data = datasets.ImageFolder(test_dir, transform=data_transforms['test'])
# data loader
train_loader = torch.utils.data.DataLoader(train_data,
batch_size=batch_size,
num_workers=num_workers,
shuffle=True)
valid_loader = torch.utils.data.DataLoader(valid_data,
batch_size=batch_size,
num_workers=num_workers,
shuffle=False)
test_loader = torch.utils.data.DataLoader(test_data,
batch_size=batch_size,
num_workers=num_workers,
shuffle=False)
loaders_scratch = {
'train': train_loader,
'valid': valid_loader,
'test': test_loader
}
Defining the Model Architecture(for Scratch)
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
num_classes = 133
import torch.nn as nn
import torch.nn.functional as F
# define the CNN architecture
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
## Define layers of a CNN
self.conv1 = nn.Conv2d(3, 32, 3, stride=2, padding=1)
self.conv2 = nn.Conv2d(32, 64, 3, stride=2, padding=1)
self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
# pool
self.pool = nn.MaxPool2d(2, 2)
# fully-connected
self.fc1 = nn.Linear(7*7*128, 500)
self.fc2 = nn.Linear(500, num_classes)
# drop-out
self.dropout = nn.Dropout(0.3)
def forward(self, x):
## Define forward behavior
x = F.relu(self.conv1(x))
x = self.pool(x)
x = F.relu(self.conv2(x))
x = self.pool(x)
x = F.relu(self.conv3(x))
x = self.pool(x)
# flatten
x = x.view(-1, 7*7*128)
x = self.dropout(x)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
# instantiate the CNN
model_scratch = Net()
print(model_scratch)# move tensors to GPU if CUDA is available
if use_cuda:
model_scratch.cuda()
The lossfunction used here is Cross Entryopy Loss
#implementing the loss function
criterion_scratch = nn.CrossEntropyLoss()
optimizer_scratch = optim.SGD(model_scratch.parameters(), lr = 0.05)
Function implementation to train and validate the model
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
"""returns trained model"""
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf
model.load_state_dict(torch.load('model_scratch.pt'))
for epoch in range(1, n_epochs+1):
# initialize variables to monitor training and validation loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
model.train()
for batch_idx, (data, target) in enumerate(loaders['train']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## find the loss and update the model parameters accordingly
## record the average training loss, using something like
## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
# initialize weights to zero
optimizer.zero_grad()
output = model(data)
# calculate loss
loss = criterion(output, target)
# back prop
loss.backward()
# grad
optimizer.step()
train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))
if batch_idx % 100 == 0:
print('Epoch %d, Batch %d loss: %.6f' %
(epoch, batch_idx + 1, train_loss))
######################
# validate the model #
######################
model.eval()
for batch_idx, (data, target) in enumerate(loaders['valid']):
# move to GPU
if use_cuda:
data, target = data.cuda(), target.cuda()
## update the average validation loss
output = model(data)
loss = criterion(output, target)
valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data - valid_loss))
# print training/validation statistics
print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
epoch,
train_loss,
valid_loss
))
## TODO: save the model if validation loss has decreased
if valid_loss < valid_loss_min:
torch.save(model.state_dict(), save_path)
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(
valid_loss_min,
valid_loss))
valid_loss_min = valid_loss
# return trained model
return model
# train the model
model_scratch = train(80, loaders_scratch, model_scratch, optimizer_scratch,
criterion_scratch, use_cuda, 'model_scratch.pt')
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
This model is trained for 80 epochs on GPU instance
On completion of training it gives a accuracy of 92%
The same is implemented for the Transfer learning model but in that case case a very less number of epochs are required for the model to train and produce promising results.
Conclusion
- During the training of the model with randomly initalized weights the training takes a considerable number of epochs to reduce the loss and improve the accuracy.
- Using the transfer learning techniques we can train the model faster as the model is initally trained for certain datasets and now it only requires a fine-tuning.
- We do not require deep networks for classification proposes in initial phases.
- In future, I will try to implement this model using pruning techniques and reduce themodel size and training time. Thus enabling its implementation on edge devices.
- We could reduce noise in the training data by randomly checking the training images that were badly focused or with more than one breed of dog.
Link to the GITHUB Reposotory : Dog Breed Classifier
Follow me at
Linkedin : https://www.linkedin.com/in/sushantag9/
Twitter : @sushantag9