TRANSFER LEARNING DEMYSTIFIED

What is Transfer learning?

Transfer learning is a machine learning technique where model trained on one task is applied to another similar task. It is an optimization technique that provides increased performance when modeling on second related task.

Let us understand transfer learning concept from a Experienced Data scientist— Aspiring Data Scientist analogy.

An Experienced data scientist has years of experience in the Data Science. With all his knowledge and experience, the information that aspiring Data Scientists get is a concise overview of the Data Science. So it can be seen as a “transfer” of information from the Expert to a Novice.

In the same way, we can compare this to learning a language. Knowledge gained by learning a Mexican language can be reused while learning Japanese language. Instead of learning Japanese language from scratch we reuse our knowledge to learn it.

What is a Pre-trained Model?

In simple terms, a Pre-trained model is a model created by some one else to solve a related problem. Thus instead of building a model from scratch to solve a similar problem, you can use model trained on other problem as a start point.

For example, if you want to build a self driving car. It takes time to build optimized image recognition algorithm from scratch or you can take a pre-trained model) from Google which was built on ImageNet data to identify images effectively.

How to use Transfer learning?

There are two common approaches used:-

1. Develop Model Approach

2. Pre-trained Model Approach

Develop Model Approach

· You must select a related problem with an abundance of data where there is some relationship in the input data/concepts learned during the mapping from input to output data.

· Next, develop a model for this first task. The model must be better than a naive model to ensure that some feature learning has been performed.

· The model fits on the source task can then be used as the starting point for a model on the related second task.

· Then model may need to be refined on the input-output pair data available for the similar task.

Pre-trained Model Approach

·A Pre-trained source model is chosen from available models. Many research institutions release models on large datasets that may be included in the pool of effective models from which to choose from.

· This pre-trained model can then be used as the starting point for a model on the second task of interest. This may involve using all or parts of the model, depending on the machine learning technique used.

· The model may need to be adapted or refined on the input-output pair data available for the task of interest.

This second type of transfer learning(Pre-trained model) is commonly used in deep learning.

How can I use Pre-trained Models?

When we train a neural network to correctly identify the weights for the network by multiple forward and backward iterations. By using Pre-trained models which are trained on large datasets, we can directly use the weights and architecture obtained and apply the learning on our problem .This is known as transfer learning.

We should be careful while choosing what pre-trained model you should use in your case. If the problem statement we have at hand is very different from the one on which the pre-trained model was trained — the prediction we would get would be very inaccurate. For example, a model previously trained for face recognition would work horribly if we try to use it to identify objects using it.

The pre-trained models exhibit a strong ability to generalize to images outside the ImageNet dataset via transfer learning. We can make modifications in the pre-existing model by fine-tuning the model.As we assume that the pre-trained network has been trained quite well, we would not want to modify the weights.

Ways to Fine tune the model

Feature extraction — We can use a pre-trained model as a feature extraction mechanism. What we can do is that we can remove the output layer( the one which gives the probabilities for being in each of the 100 classes) and then use the entire network as a feature etarctor

Architecture of the pre-trained model — We use architecture of the model while we initialize all the weights randomly and train the model according to our dataset.

Train some layers while freeze rest– Another way to use a pre-trained model is to train is partially. What we can do is we keep the weights of initial layers of the model frozen while we retrain only the higher layers.

: