Sitemap

Everything you need to know about Transfer Learning

4 min readFeb 15, 2024

For deep learning models, the data needs to be labelled, and if the dataset is not publicly available, then we have to manually label the data which is time consuming.

Deep learning models takes a lot of time to get trained.

Because of the above 2 reasons, people do not prefer training their own deep learning models. They prefer using pre trained models.

Now, what if the pre trained models were not trained to classify different objects. For example, we are using ImageNet pretrained models to distinguish between a mobile phone and a tablet, but these ImageNet dataset did not have any class of mobile phone or tablet. So, to solve this problem comes the concept of Transfer Learning.

Transfer Learning

Transfer Learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.

Press enter or click to view image in full size

Transfer Learning is inspired by real life learning. Example: its always said that to learn how to ride a bike, you need to know how to ride a bicycle.

Working of transfer learning

Press enter or click to view image in full size

The VGG model architecture is trained on ImageNet dataset which consists of 1000 classes and around 1.4 million images

The above architecture is divided into 2 parts:

  1. convolutional layers/ convolution base
  2. fully connected layers + output layer

The convolutional layer is responsible to extract spatial information from the image (relationship between pixels of the image)

The fully connected layers + output layer is responsible for the classification task.

Now lets imagine the mobile phone and tablet classes are not there in this ImageNet dataset, so we will apply transfer learning in the following way.

We will break the CNN after the last layer of the convolutional layer or before the first layer of the fully connected layer.

We keep the convolutional base and cast aside the fully connected layers + output layer.

We attach our self made fully connected layer + output layer to this convolutional base. We freeze the convolutional base meaning only the fully connected layer will be trained and not the convolutional base.

The convolutional layers work is to decode the image i.e extract the features from the image and the fully connected layer + output layers work is to classify this image.

The earlier layers of our convolutional layers like conv_1, conv_2 etc, their work is to extract primitive features. The later layers of the convolutional layers extract more complex features.

Ways of doing transfer learning

  1. Feature extraction (just discussed above). Done for image classification tasks where the labels are similar to that of the model which is already trained.
  2. Fine tuning: In fine tuning we dont freeze the convolutional layers. We train some of the last convolutional layers of our convolutional base too and our fully connected layer + output layer. Use fine tuning when the labels are very different compared to the labels of the dataset of the pretrained model.

Code

We are using the dogs vs cats classification dataset and I have assumed that this dataset has already been downloaded, hence wont be writing the data downloading steps.

There are two folder by the name train and test and inside each of them contains two folders dogs and cats

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

!kaggle datasets download -d salader/dogs-vs-cats

import zipfile
zip_ref = zipfile.ZipFile('/content/dogs-vs-cats.zip', 'r')
zip_ref.extractall('/content')
zip_ref.close()

Feature Extraction technique

import tensorflow
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense,Flatten
from keras.applications.vgg16 import VGG16
conv_base = VGG16(
weights='imagenet',
include_top = False, # it means that dense layer is not to be included
input_shape=(150,150,3)
)
conv_base.summary()
# Now, designing our own fully connected layer + output layer

model = Sequential()

model.add(conv_base)
model.add(Flatten())
model.add(Dense(256,activation='relu'))
model.add(Dense(1,activation='sigmoid'))
model.summary()
conv_base.trainable = False # Since we dont want the convolution layers to be trained again
# generators
train_ds = keras.utils.image_dataset_from_directory(
directory = '/content/train',
labels='inferred',
label_mode = 'int',
batch_size=32,
image_size=(150,150)
)

validation_ds = keras.utils.image_dataset_from_directory(
directory = '/content/test',
labels='inferred',
label_mode = 'int',
batch_size=32,
image_size=(150,150)
)

'''
labels='inferred',:

This specifies how the labels for the images are inferred.
In this case, the labels are inferred from the subdirectories of the training directory.

label_mode = 'int',:

This specifies the type of labels to use.
In this case, the labels are represented as integers.
'''
# Normalize the image
def process(image,label):
image = tensorflow.cast(image/255. ,tensorflow.float32)
return image,label

train_ds = train_ds.map(process)
validation_ds = validation_ds.map(process)
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
history = model.fit(train_ds,epochs=10,validation_data=validation_ds)
import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'],color='red',label='train')
plt.plot(history.history['val_accuracy'],color='blue',label='validation')
plt.legend()
plt.show()
plt.plot(history.history['loss'],color='red',label='train')
plt.plot(history.history['val_loss'],color='blue',label='validation')
plt.legend()
plt.show()

Fine tuning

The only change in fine tuning will be in 4th block (of feature extraction) which is

conv_base.trainable = True

set_trainable = False

for layer in conv_base.layers:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
else:
layer.trainable = False

for layer in conv_base.layers:
print(layer.name,layer.trainable)

--

--

No responses yet