Introduction to Transfer Learning

Published in

Udacity PyTorch Challengers

6 min readJan 5, 2019

This article is an introduction to transfer learning (TL) using PyTorch. I will illustrate the concept in simple terms and present the tools used to perform TL, applied to an image recognition problem. The prerequisite for following this article is to have a basic understanding of Neural Networks (how is forward and back propagation performed, what is a loss function and how an optimizer updates the weights of a neural network.) By the end of this article, I hope to show how to effectively apply TL to image recognition problems.

But first, what is Transfer Learning?

Imagine if you had a model that has been trained for a certain problem, for example, we have been approached by a novice florist, to classify images of preordered flowers and tag their names so the orders can be fulfilled. We have trained the model and it has given us excellent results, we are happy now.

Now, a different problem arises. What if we want to classify images of cats and dogs now? Can we apply our previously trained model our new problem? The answer is, we sure can! It is analogous to learning by humans. The knowledge we gain from 1 domain can sometimes be applicable to another (think calculus to problems in physics).

However, the model cannot be applied verbatim just like that, it needs to learn even further to tweak the solution to solve the new problem. This is exactly the intuition behind Transfer Learning, we essentially reuse a pre-trained model and let it learn and optimize further to solve a new problem.

Now that we have an intuition of what TL is, let’s apply our knowledge to an existing image classification problem. The problem that we will be working on is classifying 102 species of flowers, the data can be obtained here.

Notebook Set-up

Training a deep learning model to recognize images will take a long time in CPU. We can utilize a GPU instead for superior processing power.

How can we use a GPU? Introducing Google Colab. Google Colab is essentially Jupyter Notebook hosted on a cloud environment. This amazing tool provides free Tesla K80 GPU. You can save your Google Colab notebooks on a Google Drive folder, and sync your data and model files on Drive and Colab, saving space on your hard drive.

To get started, you can refer to this link for examples of code in Google Colab Notebook. Note that you would need a Google Account to gain access. Once you are ready, create a new Python 3 notebook. Click on Runtime in the Notebook and change Runtime Type to GPU for hardware accelerator.

Next, let’s mount our Google Drive, to access the images saved in Drive folder (Note that you have to save the images you want to classify in a Google Drive Folder).

You should be able to see your Google Drive folders on the left panel:

Change working directory to folder containing images using the command:

Here, pytorch_challenge is the folder containing my images.

Importing required libraries

The code below imports required libraries and checks if GPU is available. (pip install torch before importing).

torch refers to Python’s PyTorch library while PIL is Python’s Imaging Library which is used to represent an image.

We need to perform transformations to our data, namely adding noise to our train data and resizing both train and validation/test data to suit our model.

Data Preprocessing

We first define the transformations to the images we want to implement. train_transforms are performed on images in our training set while valid_transforms are performed on images in our validation set.

Essentially, we are randomly rotating, resizing and flipping our train images. All images are cropped to 224x224 to fit our model (to be trained).

datasets.ImageFolder looks into the folders in our train/valid/test directories and performs the transformations on the images in the folders, along with generating labels based on the subdirectories in the folders.

datasets.DataLoader takes the ImageFolder Objects, along with parameters such as batch_size and shuffle and returns a generator which can yield a tuple containing a torch tensor representing our images in each batch and their labels.

Preparing our pre-trained model

Next, we load our pre-trained model using models from torchvision

Our pre-trained model is Resnet152. (You can read more about Resnet here).

This is not the only choice of pre-trained models. For a full list of pre-trained models and their comparisons, refer to this link.

To view the layers in the pre-trained model, simply run:

We freeze all parameters except for Layer4, The 2nd last layer (average pooling layer) and the fully connected layer. Freezing here refers to telling our model not to update weights in the layers during the optimization step when we train our model later on.

The rationale for freezing the first few layers of our model is to reduce the number of parameters that need to be updated, for faster computing speed and to be more memory efficient.

Without freezing, total updatable params: ~58M. With freezing, it is reduced to ~15M.

We need to replace the fully connected layer in the pre-trained model with a new layer, ensuring our outputs match the number of labels that we want (in this case 102).

Retraining our Pre-trained model

Define our loss and optimizer functions:

nn.CrossEntropyLoss criterion combines nn.LogSoftmax() and nn.NLLLoss().

It takes the outputs from LogSoftmax of our last layer and calculates the loss by taking their negative log likelihood.

The criterion can also be set to NLLLoss, however, we would have to set our last layer to nn.LogSoftmax in this case which is why we are sticking to this criterion for simplicity.

Here, Adadelta is selected as the optimizer as it can adapt the learning rate over time.

A learning rate scheduler is also used to decrease our learning rate by half every 10 epochs, to ensure that the model can learn smoothly without too many oscillations around the same loss.

The next step will be to go ahead and train the model. As the code to do this is quite lengthy, I will not be sharing the code here. You can find it on my github repo.

The main steps for training are: 1) for each epoch, forward propagate to obtain loss. 2) Using loss, perform backpropagation. 3) Update parameters using optimizer 4) At the end of each epoch, keep track of loss and compare with minimum loss. If loss decreases, save new loss and our model.

And that’s essentially how Transfer Learning is done! We freeze initial layers of a pre-trained model and unfreeze the last few layers to retrain the parameters. This can also be applied to other image recognition problems.

Visualizing our Results:

Within 34 epochs, our model is able to obtain a validation accuracy of 97%. Not bad at all!

Model is able to correctly identify a pink primrose

Alright, that’s the end of this article. Please leave a comment if there are any concepts or tips that I may have missed! Special thanks to Udacity and Facebook for providing the resources in assistance of the completion of this project.