The art of transfer learning (Part-I)
Hi, friends, today in this blog-post I will give you an overview of the idea of transfer learning. This blog is divided into two parts and In this part, I will try to explain the theoretical concepts of different types of transfer learning techniques and how to store and use the feature vectors for making a pretty accurate image classifier.
After completing this blog, you will be able to:
- Learn what are the different types of Transfer learning when applied to deep learning for computer vision.
- Extracting features from a pre-trained CNN.
- The basic understanding of the HDF5 data format.
1. What is Transfer learning:
Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.
source: https://en.wikipedia.org/wiki/Transfer_learning#Definition
So this is a standard definition let me simplify it a little further we can imagine transfer learning as the ability to use the pre-trained model as a shortcut to learning patterns from the data it was not originally trained on. Still not clear let us take an example of a standard machine learning problem in which we have given two classification problems:
1: Our first task is to train a CNN(Convolution Neural Network) which is able to classify between motorcycle and bicycle.
2: The second task is to classify 3 types of car classes such as sedan, SUV and sports car
Now using the standard practices in machine learning, neural networks, and deep learning we would treat this as two separate problems. First, we will gather enough data on the bicycle and motorcycle label it and then training the model on the dataset. A similar process is followed for the second task, only this time, collecting images of different car classes and then training a model on top of the labeled cars dataset.
Now Transfer learning proposes a different training paradigm:- What if we could use an existing pre-trained classifier and use it as a starting point for a new classification task ?
In the context of the above challenges, we would first train a CNN to classify bicycles versus motorcycles. Then, we would use the same CNN trained over bicycles and motorcycles data to be used to distinguish between car classes,
even though car data was not mixed with the bicycle and motorcycle data.
1.1 How good this is to be true?
So this above method sounds very promising to be true but unfortunately, it is not because this type of transfer learning is only good for the neural network which is really deep and trained on large-scale datasets such as the Image-net dataset. These networks are excellent at the task of transfer learning as these networks learn a set of rich, discriminating features to recognize 1000 separate object classes. It makes sense that these filters can be reused for classification tasks other than what CNN was originally trained on.
In general, there are two types of transfer learning when applied to deep learning for computer vision:
1. Treating networks as arbitrary feature extractors.
2. Removing the fully-connected layers of an existing network, by placing a new FC layer set on top of the CNN, and fine-tuning these weights (and optionally previous layers) to recognize object classes.
In this blog, we’ll be focusing primarily on the first method of transfer learning, treating networks as feature extractors.
2. Extracting features from a pre-trained CNN:
We always treated a CNN as an end to end image classifiers as follows:
1. We input an image to the network.
2. The image forward propagates through the network.
3. We obtain the final classification probabilities from the end of the network.
However, there is not a strict rule to follow these above steps and always propagate the image through the entire network. Instead, we can stop the propagation at any arbitrary portion of the network such as activation or a pooling layer, extract the values from the network at this time and use it as a feature vector. We can understand it more properly by using this example
which is a standard VGG16 architecture or can be considered as a standard pre-trained model over the Imagenet dataset. The left side of the image is the original VGG16 network architecture that outputs probabilities for each of the 1,000 ImageNet class labels and the right side of the image is depicting the network structure where the final fully connected layer is removed from the VGG16 and instead of returning the output of the final POOL layer. This output will serve as our extracted features. Along with the layers in the network, we have also included the input and output shapes of the volumes for each layer. In this type of approach where networks are treated as a feature extractor, we essentially try to “cut” the network at an arbitrary point usually it is done before the fully connected layers but it really depends on the dataset and the use-case.
2.1 Understanding the process:
On the right side of the above image, the last layer is a max-pooling layer which will have the output shape of 7 x 7 x 512 given there are 512 filters each of size 7 x 7. If an image is forward propagated through the network with the FC layer removed, we would be left with a 512, 7 x 7 activations that have either activated or not based on the image contents. Therefore we can actually take these 7 x 7 x 512 = 25,088 values and treat them as a feature vector that quantifies the contents of an image. Now if we repeat this process for an entire dataset of images(including the datasets that VGG16 was originally not trained on), we will get a design matrix of N images each with 25,088 columns used to quantify their contents(i.e feature vectors). So by using these feature vectors we can train any machine learning model such as a Linear SVM, Logistic Regression classifier, or Random forest on top of the feature to obtain a classifier that recognizes new classes of the image. CNN alone is not capable of recognizing any of these new classes instead we are using it as an intermediary feature extractor. The underlying ML model will take care of learning the patterns from the feature extracted by the CNN, but by applying transfer learning, we are able to build super-accurate image classifiers with little effort. The trick is extracting these features and storing them in an efficient manner. To accomplish this task, we’ll need HDF5.
2.1.1 What is HDF5?
The Hierarchical Data Format version 5 (HDF5), is an open-source file format that supports large, complex, heterogeneous data. HDF5 uses a “file directory” like structure that allows you to organize data within the file in many different structured ways, as you might do with files on your computer. what we call as “directory” or “folders” on our computer is termed as a “groups” and what we call as “files” on our computer are termed as “datasets” in HDF5. There are two important terms in HDF5 and they are as follows:
- Group: A folder like element within an HDF5 file that might contain other groups OR datasets within it.
- Dataset: The actual data contained within the HDF5 file. Datasets are often (but don’t have to be) stored within groups in the file.
A dataset can be thought of as a multi-dimensional array (i.e., a NumPy array) of a homogeneous data type (integer, float, Unicode, etc.). HDF5 is written in C; however, by using the h5py module (h5py.org), we can gain access to the underlying C API using the Python programming language. What makes h5py so awesome is the ease of interaction with data. We can store huge amounts of data in our HDF5 dataset and manipulate the data in a NumPy-like fashion. For example, we can use standard Python syntax to access and slice rows from multi-terabyte datasets stored on disk as if they were simple NumPy arrays loaded into memory. Thanks to specialized data structures, these slices, and row accesses are lighting quick. When using HDF5 with h5py, you can think of your data as a gigantic NumPy array that is too large to fit into the main memory but can still be accessed and manipulated just the same. Perhaps best of all, the HDF5 format is standardized, meaning that datasets stored in HDF5 format are inherently portable and can be accessed by other developers using different programming languages such as C, MATLAB, and Java.
Thank you very much friends for reading it, in the next upcoming blog-post which will be the continuation of this post, I will try to write and give you a walkthrough of the custom Keras and python code which is able to accept the input data and by using the transfer learning. We will use a pre-trained CNN model such as VGG16, Resnet, Alexnet, etc as a feature extractor and store the respective features in the HDF5 file format. Using the resulting feature vectors we will then try to build a kick-ass image classifier so please stay tuned.
References:
https://en.wikipedia.org/wiki/Transfer_learning
https://www.pyimagesearch.com/2019/05/20/transfer-learning-with-keras-and-deep-learning/