Image classification using Fast.ai

manohar krishna
Analytics Vidhya
Published in
5 min readSep 11, 2020

Problem statement:

The problem statement is to classify vehicle images as either belonging to the emergency vehicle or non-emergency vehicle category. For the same, we are provided with train and test dataset.

  • Emergency vehicles usually include police cars, ambulance, and fire brigades.
  • Non-emergency vehicles include all other vehicles used commonly for commuting.
source

Data Description:

  1. train.csv — [‘image_names’, ‘emergency_or_not’] contains the image name and correct class for 1646 (70%) train images
  2. images — contains 2352 images for both train and test sets
  3. test.csv: [‘image_names’] contains just the image names for the 706 (30%) test images

This problem statement was provided by Analytics Vidhya : Computer Vision Hackathon

Let us quickly separate train and test images from the images folder for training

import numpy as np
import pandas as pd
import os
import glob
from PIL import Image
train_csv = pd.read_csv('train_av_cv.csv')
test_csv = pd.read_csv('test_av_cv.csv')
path = 'images'
os.chdir(path)
train_csv_emergency = train_csv[train_csv['emergency_or_not']==1]
train_csv_non_emergency = train_csv[train_csv['emergency_or_not']==0]
for i in range(0,train_csv_emergency.shape[0]):
name = train_csv_emergency['image_names'].iloc[i]
im = Image.open(name)
to_save = 'emergency/'+ name
im.save(to_save, 'JPEG')

for i in range(0,train_csv_non_emergency.shape[0]):
name = train_csv_non_emergency['image_names'].iloc[i]
im = Image.open(name)
to_save = 'non_emergency/'+ name
im.save(to_save, 'JPEG')

for i in range(0,test_csv.shape[0]):
name = test_csv['image_names'].iloc[i]
im = Image.open(name)
to_save = 'test_images/'+ name
im.save(to_save, 'JPEG')

Let’s have a look at the size of the images

from PIL import Imagedef get_num_pixels(filepath):
width, height = Image.open(filepath).size
return width,height
print(get_num_pixels("images/1.jpg"))Output: (224,224)

Now that we’ve separated train and test images, let’s jump into building the classification model. End-to end process for the solution is divided into 6 steps that are as followed:

  1. Reading the data
  2. defining the model
  3. Defining Learning Rate
  4. Fitting the model
  5. Interpreting the results
  6. Inferencing

Let’s start with reading the data from folder

ImageDataBunch helps us to create training, validation and test data sets

xtra = [(contrast(scale=(0.1,2.0), p = 0.5)),
(brightness(change = 0.6, p = 0.25)),
(perspective_warp(magnitude = 0.25, p = 0.25)),
(squish(scale = 1.2, p = 0.25))]
augmentations = get_transforms(do_flip=True,flip_vert=False, max_rotate=None, max_zoom=1.5,max_warp=None, p_affine = 0.5, max_lighting=0.2,p_lighting=0.5, xtra_tfms=xtra)
data=ImageDataBunch.from_folder('/content/gdrive/My Drive/', train = 'train_images', valid= 'train_images' ,size=224, bs=16, ds_tfms = get_transforms(do_flip=True,
max_rotate=None, ## amount of rotaion
max_zoom=1.2, ## amount of zoom
max_warp=0.2, ## amount of wraping
p_affine = 0.5, ## probability by which above 3 tfms will take place
max_lighting=0.2, ## amount of lightin
p_lighting=0.5 ## probability for lighting))
data.normalize(imagenet_stats)#for extra augmentations -
#ds_tfms = augmentations.normalize(imagenet_stats)

Let’s check the classes present in the data

data.classesOutput: ['emergency', 'non_emergency']

Let’s now define the model

cnn_learner method is used to get a model suitable for transfer learning quickly. In short, it is a concept used to train our model using a predefined/trained model. For the moment, we are building a model which will take images as input and will output the predicted probability for each of the categories.

## Defining callbacksSave_Model_Callback=partial(SaveModelCallback, every='improvement', monitor='accuracy', name='av_cv')
EarlyStoppingCallback = partial(EarlyStoppingCallback, monitor='valid_loss', min_delta=0.01, patience=5)
#defining model and optimizerlearn = cnn_learner(data, models.resnet152 , pretrained= True ,opt_func=optim.Adam, metrics=[accuracy,error_rate,FBeta(beta=1),Precision(),Recall()], callback_fns=[Save_Model_Callback,EarlyStoppingCallback])

The reason to use a pretrained model is to avoid training the model from scratch. Pretrained models are well trained with millions of images and thousands of categories using ImageNet dataset. Last layer of ImageNet consists of 1000 columns which correspond to 1000 different categories. There is no guarantee that the predefined categories contain the category of our requirement, but one thing we are confident is that the model knows something and can differentiate between a dog and a truck(for say).

There are a lot of pretrained architectures available that are trained on different image datasets. Here, I have taken resnet152 trained on ImageNet dataset.

As far as optimizer is concerned, I’ve tried both Adam and SGD while Adam gave better results.

There are a bunch of metrics one could use to print at the time of training and I’ve called for accuracy, error_rate, FBeta(beta=1), Precision(), Recall().

The “Save_Model_Callback” is used to automatically save a new ‘best loss’ model during training. At the end of the training, it loads the top model. One can specify the criteria that defines the best model. I’ve specified the criteria to be ‘accuracy’.

The “EarlyStoppingCallback” is used to automatically stop training when it encounters three continuous increase in loss or three consecutive changes in the loss that is less than 0.01

Defining Learning Rate

# This helps to find best learning rate for model.learn.lr_find()#Plot of LRlearn.recorder.plot()

We might want to give the learning rate while fitting the model(next step) for better results. It defines how quickly the parameters get updated in the model. The plot looks something like this.

The range where the curve shows a steep decrease is taken into consideration. We can observe that the learning rate showed a steep decrease in the range [1e-04,1e-02].

Fitting the model

num_of_epochs = 20learn.fit_one_cycle(num_of_epochs,max_lr=slice(1e-04,1e-02),moms=(0.95,0.85))learn.recorder.plot_losses()#Iterations Vs LR and Iterations Vs Momslearn.recorder.plot_lr(show_moms=True)#Plot Metrics across Iterationslearn.recorder.plot_metrics()

Here, number of epochs is how many times the entire data should be passed through the model to learn.

Interpreting the results

learn.load('av_cv')interpret=ClassificationInterpretation.from_learner(learn)interpret.plot_confusion_matrix(figsize=(5,5),dpi=100)

Plotting the confusion matrix helps us in understanding the performance of the model.

interpret.plot_top_losses(6, heatmap=True)

plot_top_losses displays the images that were wrongly predicted and have maximum contribution in the loss.

Inferencing

Export the model into a .pkl fomat so that it could be used while making predictions on unseen data.

learn.export('av_cv.pkl')

Next step is to load the test data to the learner and make predictions

test = ImageList.from_folder(base_path + 'test_images')learn = load_learner(base_path, file = 'av_cv.pkl', test = test)
thresh = 0.5
preds, _ = learn.get_preds(ds_type=DatasetType.Test)labelled_preds = [' '.join([learn.data.classes[i] for i,p in enumerate(pred) if p > thresh]) for pred in preds]fnames = [f.name[:-4] for f in learn.data.test_ds.items]df = pd.DataFrame({'image_name':fnames, 'tags':labelled_preds}, columns=['image_name', 'tags'])df.to_csv('fastai_adam.csv', index=False)

The predictions gets saved in the form of a csv file.

Further Improvements that could be made in the model:

learn.freeze_to(-4)

freeze_to(-4) freezes all the layers except the last four and is trained only on those four layers. This could be added while defining the model.

Test time augmentations could be tried to improve the results.

References:

Fast.ai

--

--