Deep Learning for Everyone: Master the Powerful Art of Transfer Learning using PyTorch

Pulkit Sharma
Analytics Vidhya
Published in
15 min readOct 22, 2019

I was working on a computer vision project last year where we had to build a robust face detection model. The concept behind that is fairly straightforward — it’s the execution part that always sticks in my mind.

Given the size of the dataset we had, building a model from scratch was a real challenge. It was going to be potentially time-consuming and a strain on the computational resources we had. We had to figure out a solution quickly because we were working with a tight deadline.

This is when the powerful concept of transfer learning came to our rescue. It is a really helpful tool to have in your data scientist armoury, especially when you’re working with limited time and computational power.

So in this article, we will learn all about transfer learning and how to leverage it on a real-world project using Python. We’ll also discuss the role of pre-trained models in this space and how they’ll change the way you build machine learning pipelines.

This article is part of my PyTorch series for beginners. I strongly believe PyTorch is one of the best deep learning frameworks right now and will only go from strength to strength in the near future. This is a great time to learn how it works and get onboard. Make sure you check out the previous articles in this series:

Introduction to Transfer Learning

Let me illustrate the concept of transfer learning using an example. Picture this — you want to learn a topic from a domain you’re completely new to. Pick any domain and any topic — you can think of deep learning and neural networks as well.

What are the different approaches you would take to understand the topic? Off the top of my head:

  • Search online for resources
  • Read articles and blogs
  • Refer to books
  • Look out for video tutorials, and so on

All of these will help you get comfortable with the topic. In this situation, you are the only person who is putting in all the effort.

But there’s another approach, which might yield better results in a short amount of time.

You can consult a domain/topic expert who has a solid grasp on the topic you want to learn. This person will transfer his/her knowledge to you. thus expediting your learning process.

The first approach, where you are putting in all the effort alone, is an example of learning from scratch. The second approach is referred to as transfer learning. There is a knowledge transfer happening from an expert in that domain to a person who is new to it.

Yes, the idea behind transfer learning is that straightforward!

Neural Networks and Convolutional Neural Networks (CNNs) are examples of learning from scratch. Both these networks extract features from a given set of images (in case of an image related task) and then classify the images into their respective classes based on these extracted features.

This is where transfer learning and pre-trained models are so useful. Let’s understand a bit about the latter concept in the next section.

What are Pre-trained Models and how to Pick the Right Pre-trained Model?

Pre-trained models are super useful in any deep learning project that you’ll work on. Not all of us have the unlimited computational power of the top tech behemoths. We need to make do with our local machines so pre-trained models are a blessing there.

A pre-trained model, as you might have surmised already, is a model already designed and trained by a certain person or team to solve a specific problem.

Recall that we learn the weights and biases while training models like Neural Network and CNNs. These weights and biases, when multiplied with the image pixels, help to generate features.

Pre-trained models share their learning by passing their weights and biases matrix to a new model. So, whenever we do transfer learning, we will first select the right pre-trained model and then pass its weight and bias matrix to the new model.

There are n number of pre-trained models available out there. We need to decide which will be the best-suited model for our problem. For now, let’s consider that we have three pre-trained networks available — BERT, ULMFiT, and VGG16.

Our task is to classify the images (as we have been doing in the previous articles of this series). So, which of these pre-trained models will you pick? Let me first give you a quick overview of these pre-trained networks which will help us to decide the right pre-trained model.

BERT and ULMFiT are used for language modeling and VGG16 is used for image classification tasks. And if you look at the problem at hand, it is an image classification one. So it stands to reason that we will pick VGG16.

Now, VGG16 can have different weights, i.e. VGG16 trained on ImageNet or VGG16 trained on MNIST:

ImageNet vs. MNIST

Now, to decide the right pre-trained model for our problem, we should explore these ImageNet and MNIST datasets. The ImageNet dataset consists of 1000 classes and a total of 1.2 million images. Some of the classes in this data are animals, cars, shops, dogs, food, instruments, etc.:

MNIST, on the other hand, is trained on handwritten digits. It includes 10 classes from 0 to 9:

We will be working on a project where we need to classify images into emergency and non-emergency vehicles (we will discuss this in more detail in the next section). This dataset includes images of vehicles so a VGG16 model trained on the ImageNet dataset would be more useful for us as it has images of vehicles.

This, in a nutshell, is how we should decide the right pre-trained model based on our problem.

Case Study: Emergency vs Non-Emergency Vehicle Classification

Ideally, we would be using the Identify the Apparels problem for this article. We’ve worked on it in the previous two articles of this series and that would help in comparing our progress.

Unfortunately, this isn’t possible here because VGG16 requires that the images should be of the shape (224,224,3) (the images in the other problem are of shape (28,28)). One way to combat this could have been to resize these (28,28) images to (224,224,3) but this will not make sense intuitively.

Here’s the good part — we’ll be working on a brand new project! Here, our aim is to classify the vehicles as emergency or non-emergency.

This project is also a part of the Computer Vision using Deep Learning course by Analytics Vidhya. To work on more such interesting projects and learn the concepts of computer vision in much more detail, feel free to check out the course.

Let’s now start with understanding the problem and visualizing a few examples. You can download the images using this link. First, import the required libraries:

# importing the libraries
import pandas as pd
import numpy as np
from tqdm import tqdm
# for reading and displaying images
from skimage.io import imread
from skimage.transform import resize
import matplotlib.pyplot as plt
%matplotlib inline
# for creating validation set
from sklearn.model_selection import train_test_split
# for evaluating the model
from sklearn.metrics import accuracy_score
# PyTorch libraries and modules
import torch
from torch.autograd import Variable
from torch.nn import Linear, ReLU, CrossEntropyLoss, Sequential, Conv2d, MaxPool2d, Module, Softmax, BatchNorm2d, Dropout
from torch.optim import Adam, SGD
# torchvision for pre-trained models
from torchvision import models

Next, we will read the .csv file containing the image name and the corresponding label:

# loading dataset
train = pd.read_csv('emergency_train.csv')
train.head()

There are two columns in the .csv file:

  1. image_names: It represents the name of all the images in the dataset
  2. emergency_or_no: It specifies whether that particular image belongs to the emergency or non-emergency class. 0 means that the image is a non-emergency vehicle and 1 represents an emergency vehicle

Next, we will load all the images and store them in an array format:

# loading training images
train_img = []
for img_name in tqdm(train['image_names']):
# defining the image path
image_path = 'images/' + img_name
# reading the image
img = imread(image_path)
# normalizing the pixel values
img = img/255
# resizing the image to (224,224,3)
img = resize(img, output_shape=(224,224,3), mode='constant', anti_aliasing=True)
# converting the type of pixel to float 32
img = img.astype('float32')
# appending the image into the list
train_img.append(img)
# converting the list to numpy array
train_x = np.array(train_img)
train_x.shape

It took approximately 12 seconds to load these images. There are 1,646 images in our dataset and we have reshaped all of them to (224,224,3) since VGG16 requires all the images in this particular shape. Let’s now visualize a few images from the dataset:

# Exploring the data
index = 10
plt.imshow(train_x[index])
if (train['emergency_or_not'][index] == 1):
print('It is an Emergency vehicle')
else:
print('It is a Non-Emergency vehicle')

This is a police car and hence has a label of Emergency vehicle. Now we will store the target in a separate variable:

# defining the target
train_y = train['emergency_or_not'].values

Let’s create a validation set to evaluate our model:

# create validation set
train_x, val_x, train_y, val_y = train_test_split(train_x, train_y, test_size = 0.1, random_state = 13, stratify=train_y)
(train_x.shape, train_y.shape), (val_x.shape, val_y.shape)

We have 1,481 images in the training set and remaining 165 images in the validation set. We now have to convert the dataset into torch format:

# converting training images into torch format
train_x = train_x.reshape(1481, 3, 224, 224)
train_x = torch.from_numpy(train_x)
# converting the target into torch format
train_y = train_y.astype(int)
train_y = torch.from_numpy(train_y)
# shape of training data
train_x.shape, train_y.shape

Similarly, we will convert the validation set:

# converting validation images into torch format
val_x = val_x.reshape(165, 3, 224, 224)
val_x = torch.from_numpy(val_x)
# converting the target into torch format
val_y = val_y.astype(int)
val_y = torch.from_numpy(val_y)
# shape of validation data
val_x.shape, val_y.shape

Our data is ready! In the next section, we will build a Convolutional Neural Network (CNN) before we use the pre-trained model to solve this problem.

Solving the Challenge using Convolutional Neural Networks (CNNs)

We are finally at the model building part! Before using transfer learning to solve the problem, let’s use a CNN model and set a benchmark for ourselves.

We will build a very simple CNN architecture with two convolutional layers to extract features from images and a dense layer at the end to classify these features:

class Net(Module):   
def __init__(self):
super(Net, self).__init__()
self.cnn_layers = Sequential(
# Defining a 2D convolution layer
Conv2d(3, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
# Defining another 2D convolution layer
Conv2d(4, 8, kernel_size=3, stride=1, padding=1),
BatchNorm2d(8),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
)
self.linear_layers = Sequential(
Linear(8 * 56 * 56, 2)
)
# Defining the forward pass
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x

Let’s now define the optimizer, learning rate and the loss function for our model and use a GPU to train the model:

# defining the model
model = Net()
# defining the optimizer
optimizer = Adam(model.parameters(), lr=0.0001)
# defining the loss function
criterion = CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()
criterion = criterion.cuda()
print(model)

This is how the architecture of the model looks like. Finally, we will train the model for 15 epochs. I am setting the batch_size of the model to 128 (you can play around with this):

# batch size of the model
batch_size = 128
# number of epochs to train the model
n_epochs = 15
for epoch in range(1, n_epochs+1):# keep track of training and validation loss
train_loss = 0.0

permutation = torch.randperm(train_x.size()[0])
training_loss = []
for i in tqdm(range(0,train_x.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = train_x[indices], train_y[indices]

if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()

optimizer.zero_grad()
# in case you wanted a semi-full example
outputs = model(batch_x)
loss = criterion(outputs,batch_y)
training_loss.append(loss.item())
loss.backward()
optimizer.step()

training_loss = np.average(training_loss)
print('epoch: \t', epoch, '\t training loss: \t', training_loss)

This will print a summary of the training as well. The training loss is decreasing after each epoch and that’s a good sign. Let’s check the training as well as the validation accuracy:

# prediction for training set
prediction = []
target = []
permutation = torch.randperm(train_x.size()[0])
for i in tqdm(range(0,train_x.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = train_x[indices], train_y[indices]
if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
with torch.no_grad():
output = model(batch_x.cuda())
softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)
prediction.append(predictions)
target.append(batch_y)

# training accuracy
accuracy = []
for i in range(len(prediction)):
accuracy.append(accuracy_score(target[i],prediction[i]))

print('training accuracy: \t', np.average(accuracy))

We got a training accuracy of around 82% which is a good score. Let’s now check the validation accuracy:

# prediction for validation set
prediction_val = []
target_val = []
permutation = torch.randperm(val_x.size()[0])
for i in tqdm(range(0,val_x.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = val_x[indices], val_y[indices]
if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
with torch.no_grad():
output = model(batch_x.cuda())
softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)
prediction_val.append(predictions)
target_val.append(batch_y)

# validation accuracy
accuracy_val = []
for i in range(len(prediction_val)):
accuracy_val.append(accuracy_score(target_val[i],prediction_val[i]))

print('validation accuracy: \t', np.average(accuracy_val))

The validation accuracy comes out to be 76%. Now that we have a benchmark with us, it’s time to use transfer learning to solve this emergency versus non-emergency vehicle classification problem. Let’s get rolling!

Solving the Challenge using Transfer Learning

I’ve touched on this above and I’ll reiterate it here — we will be using the VGG16 pre-trained model trained on the ImageNet dataset. Let’s look at the steps we will be following to train the model using transfer learning:

  1. First, we will load the weights of the pre-trained model — VGG16 in our case
  2. Then we will fine tune the model as per the problem at hand
  3. Next, we will use these pre-trained weights and extract features for our images
  4. Finally, we will train the fine tuned model using the extracted features

So, let’s start by loading the weights of the model:

# loading the pretrained model
model = models.vgg16_bn(pretrained=True)

We will now fine tune the model. We will not be training the layers of the VGG16 model and hence let’s freeze the weights of these layers:

# Freeze model weights
for param in model.parameters():
param.requires_grad = False

Since we only have 2 classes to predict and VGG16 is trained on ImageNet which has 1000 classes, we need to update the final layer as per our problem:

# Add on classifier
model.classifier[6] = Sequential(
Linear(4096, 2))
for param in model.classifier[6].parameters():
param.requires_grad = True

Since we will be training only the last layer, I have set the requires_grad as True for the last layer. Let’s set the training to GPU:

# checking if GPU is available
if torch.cuda.is_available():
model = model.cuda()

We’ll now use the model and extract features for both the training and validation images. I will set the batch_size as 128 (again, you can increase or decrease this batch_size per your requirement):

# batch_size
batch_size = 128
# extracting features for train data
data_x = []
label_x = []
inputs,labels = train_x, train_yfor i in tqdm(range(int(train_x.shape[0]/batch_size)+1)):
input_data = inputs[i*batch_size:(i+1)*batch_size]
label_data = labels[i*batch_size:(i+1)*batch_size]
input_data , label_data = Variable(input_data.cuda()),Variable(label_data.cuda())
x = model.features(input_data)
data_x.extend(x.data.cpu().numpy())
label_x.extend(label_data.data.cpu().numpy())

Similarly, let’s extract features for our validation images:

# extracting features for validation data
data_y = []
label_y = []
inputs,labels = val_x, val_yfor i in tqdm(range(int(val_x.shape[0]/batch_size)+1)):
input_data = inputs[i*batch_size:(i+1)*batch_size]
label_data = labels[i*batch_size:(i+1)*batch_size]
input_data , label_data = Variable(input_data.cuda()),Variable(label_data.cuda())
x = model.features(input_data)
data_y.extend(x.data.cpu().numpy())
label_y.extend(label_data.data.cpu().numpy())

Next, we will convert these data into torch format:

# converting the features into torch format
x_train = torch.from_numpy(np.array(data_x))
x_train = x_train.view(x_train.size(0), -1)
y_train = torch.from_numpy(np.array(label_x))
x_val = torch.from_numpy(np.array(data_y))
x_val = x_val.view(x_val.size(0), -1)
y_val = torch.from_numpy(np.array(label_y))

We also have to define the optimizer and the loss function for our model:

import torch.optim as optim# specify loss function (categorical cross-entropy)
criterion = CrossEntropyLoss()
# specify optimizer (stochastic gradient descent) and learning rate
optimizer = optim.Adam(model.classifier[6].parameters(), lr=0.0005)

It’s time to train the model. We will train it for 30 epochs with a batch_size set to 128:

# batch size
batch_size = 128
# number of epochs to train the model
n_epochs = 30
for epoch in tqdm(range(1, n_epochs+1)):# keep track of training and validation loss
train_loss = 0.0

permutation = torch.randperm(x_train.size()[0])
training_loss = []
for i in range(0,x_train.size()[0], batch_size):
indices = permutation[i:i+batch_size]
batch_x, batch_y = x_train[indices], y_train[indices]

if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()

optimizer.zero_grad()
# in case you wanted a semi-full example
outputs = model.classifier(batch_x)
loss = criterion(outputs,batch_y)
training_loss.append(loss.item())
loss.backward()
optimizer.step()

training_loss = np.average(training_loss)
print('epoch: \t', epoch, '\t training loss: \t', training_loss)

Here is a summary of the model. You can see that the loss has decreased and hence we can say that the model is improving. Let’s validate this by looking at the training and validation accuracies:

# prediction for training set
prediction = []
target = []
permutation = torch.randperm(x_train.size()[0])
for i in tqdm(range(0,x_train.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = x_train[indices], y_train[indices]
if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
with torch.no_grad():
output = model.classifier(batch_x.cuda())
softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)
prediction.append(predictions)
target.append(batch_y)

# training accuracy
accuracy = []
for i in range(len(prediction)):
accuracy.append(accuracy_score(target[i],prediction[i]))

print('training accuracy: \t', np.average(accuracy))

We got an accuracy of ~ 84% on the training set. Let’s now check the validation accuracy:

# prediction for validation set
prediction_val = []
target_val = []
permutation = torch.randperm(x_val.size()[0])
for i in tqdm(range(0,x_val.size()[0], batch_size)):
indices = permutation[i:i+batch_size]
batch_x, batch_y = x_val[indices], y_val[indices]
if torch.cuda.is_available():
batch_x, batch_y = batch_x.cuda(), batch_y.cuda()
with torch.no_grad():
output = model.classifier(batch_x.cuda())
softmax = torch.exp(output).cpu()
prob = list(softmax.numpy())
predictions = np.argmax(prob, axis=1)
prediction_val.append(predictions)
target_val.append(batch_y)

# validation accuracy
accuracy_val = []
for i in range(len(prediction_val)):
accuracy_val.append(accuracy_score(target_val[i],prediction_val[i]))

print('validation accuracy: \t', np.average(accuracy_val))

The validation accuracy of the model is also similar, i,e, 83%. The training and validation accuracies are almost in sync and hence we can say that the model is generalized. Here is the summary of our results:

We can infer that the accuracies have improved by using the VGG16 pre-trained model as compared to the CNN model. Got to love the art of transfer learning!

End Notes

In this article, we learned how to use pre-trained models and transfer learning to solve an image classification problem. We first understood what pre-trained models are and how to choose the right pre-trained model depending on the problem at hand. Then we took a case study of classifying images of vehicles as emergency or non-emergency. We solved this case study using a CNN model first and then we used the VGG16 pre-trained model to solve the same problem.

We found that using the VGG16 pre-trained model significantly improved the model performance and we got better results as compared to the CNN model. I hope you now have a clear understanding of how to use transfer learning and the right pre-trained model to solve problems using PyTorch.

I encourage you to take other image classification problems and try to apply transfer learning to solve them. This will help you to grasp the concept much more clearly.

As always, if you have some feedback or doubts related to this tutorial, feel free to post them in the comments section below and I will be happy to answer them.

Originally published at https://www.analyticsvidhya.com on October 22, 2019.

--

--