“Transfer Learning in Neural Network” in nutshell.

Published in

Analytics Vidhya

7 min readJul 27, 2020

Building the perfect neural network is Empirical Process which takes bit of a time and experiences to achieve it. But, now a days the Transfer Learning made it way faster and less hectic. So, What is Transfer Learning? In simple words. Transfer Learning is basically a process of borrowing the part or the whole of the existing Pre-Trained Models for your usability. What are the Pre-Trained Models? Its that model which is already being trained on large dataset and made available to everyone.

Definition of Transfer Learning in picture.

So you got the idea of transfer learning. Now let’s discuss how to do it hands-on. There are two ways which you can do this process, First the Just Transfer Learning which is just taking the pre-trained model as it is without any changes for your model. Second Fine-Tuning with Transfer Learning, Here the process will be the as same as you normally do but with some tailoring on it. There are many pre-trained models are available today like Xception, VGG16, VGG19, ResNet50, InceptionV3, MobileNet, MobileNetV2 and list goes on. Here I’m using VGG16 in this article which is 16 layers CNN model trained on over 14 million images belonging to 1000 classes(ImageNet).

So first let us wrap our heads on standard transfer learning then you will get on fine-tuning. Here the VGG16 model is available in the keras library so basically we can just import it from keras and start working

from tensorflow.keras.applications.vgg16 import VGG16

And that's it! you did it, easy right? Now let’s create a object for the model and you are ready for training.

vgg_model = VGG16(input_shape=<image input size>,
                  weights="imagenet",
                  include_top=False)

Here the object vgg_model is VGG16 model which takes parameters like “input_shape” which will be the shape of the image dataset you want to give. “weights” are those which model trained on, If you want initialize the model for yourself then you can simply replace the “imagenet” to just None. “include_top” is basically the last layer or output layer of the network which we don't want it so I have set that to “False” because it got 1000 outputs and our model output differ from that. If you want it then just turn it into “True” and you good to go.

By doing these simple two steps you got the VGG16 in vgg_model object and if you check for model summary. You will get this.

>>vgg_model.summary()
Model: "vgg16" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= input_1 (InputLayer)         [(None, 224, 224, 3)]     0          _________________________________________________________________ block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792       _________________________________________________________________ block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928      _________________________________________________________________ block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0          _________________________________________________________________ block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856      _________________________________________________________________ block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584     _________________________________________________________________ block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0          _________________________________________________________________ block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168     _________________________________________________________________ block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0          _________________________________________________________________ block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160    _________________________________________________________________ block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808    _________________________________________________________________ block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808    _________________________________________________________________ block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0          _________________________________________________________________ block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808    _________________________________________________________________ block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808    _________________________________________________________________ block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808    _________________________________________________________________ block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0          ================================================================= Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0 _________________________________________________________________

Here you can see the all 16 layers of the VGG16 model which got some description in the bottom. “Total params” are the total number parameter that a model can have overall. “Trainable params” is the number of the parameters that you can train, basically this model is empty with weights which means you have only got the architecture of vgg16. Lastly “Non-trainable params” as name says, these are the parameters which are freezed and not updatable during the training. Note that here you don't see the last layer because we set that to “false”.

Now as you have an idea of transfer learning, Let’s move on fine-tuning it. Here the process will be same but we take only those which we need, like the number of layers from the model. Now let us take an example say we need to build a model of only 10 layers from the VGG16 model and add another extra layer to it. This can be done by creating two objects, One for our model and another for complete VGG16 model. We first declare the empty sequential object and later we keep on adding the layers from vgg16 object to this object. To do this we got import some keras components.

from tensorflow.python.keras.models import Sequential
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.python.keras.layers import Dense, Conv2D, Flatten

As we building the sequential model, We will import Sequential and further Dense, Conv2D and Flatten for adding that extra layer to network. This can be done as,

#model to be built
model = Sequential()#vgg16 model
vgg_model = VGG16(input_shape=<image input size>,
                  weights="imagenet",
                  include_top=False)

So here model and vgg_model are the objects where model is empty sequential model and vgg_model got 16 layers model in it. As we should take only 10 layers from 16 layers VGG16. This can be done by,

for layer in vgg_model.layers[:11]:
      model.add(layer)

Here the “layer” would iterate through all the 16 layers of the vgg_model and this is done by using a method called “vgg_model.layers” or <model_name>.layers. As we need only 10 layers we limit the iteration only till 10th layer by saying “vgg_model.layers[:11]”. If we take the summary of this model we would get,

>>model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792       _________________________________________________________________ block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928      _________________________________________________________________ block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0          _________________________________________________________________ block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856      _________________________________________________________________ block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584     _________________________________________________________________ block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0          _________________________________________________________________ block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168     _________________________________________________________________ block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0          ================================================================= Total params: 1,735,488 Trainable params: 1,735,488 Non-trainable params: 0 _________________________________________________________________

As you can see we only fetched the 10 layers from VGG16 model, I know it’s easy right? and now as we got “model” object let us complete this CNN model. We can add another layer to it same as how we do while building the CNN.

model.add(Conv2D(32, (3, 3), activation = "relu"))
model.add(Flatten())
model.add(Dense(2, activation="softmax"))

So this add another Conv2D layer which further flatten for the full connection and here I have added 2 nodes at end as output. Now if we take summary of the model it would look something like this.

>>model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792       _________________________________________________________________ block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928      _________________________________________________________________ block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0          _________________________________________________________________ block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856      _________________________________________________________________ block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584     _________________________________________________________________ block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0          _________________________________________________________________ block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168     _________________________________________________________________ block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0          _________________________________________________________________ conv2d_1 (Conv2D)            (None, 26, 26, 32)        73760      _________________________________________________________________ flatten_1 (Flatten)          (None, 21632)             0          _________________________________________________________________ dense_1 (Dense)              (None, 2)                 43266      ================================================================= Total params: 1,852,514 Trainable params: 1,852,514 Non-trainable params: 0 _________________________________________________________________

And we are done with fine tuning too, That’s it! Simple right?Now try using other pre-trained models too for your projects which will make your project more effective in performance.

But…… Wait a Minute,

There’s a trick which makes the actual difference, taking the complete empty model without any weight or making all trainable parameters to True will make your network to lose its some of the important values. These networks are trained on millions of dataset so the model will definitely have some complex parameters which can’t achieved in usual method. So my advice would be keep the starting some layers open for learning and rest as freezed. This way your model will get familiar to your dataset at initial layers and smarter at the ending of the network. To do this you simply add few lines code after creation of object for model.

for layer in model.layers[7:]:
    layer.trainable = False

These two lines will makes your model half open for learning and rest not. Here I allowed model to learn upto to 7th layers and anything after that are not allowed for any weight updation. If we take the summary, we get:

>> model.summary()
Model: "sequential_1" _________________________________________________________________ Layer (type)                 Output Shape              Param #    ================================================================= block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792       _________________________________________________________________ block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928      _________________________________________________________________ block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0          _________________________________________________________________ block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856      _________________________________________________________________ block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584     _________________________________________________________________ block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0          _________________________________________________________________ block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168     _________________________________________________________________ block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080     _________________________________________________________________ block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0          _________________________________________________________________ conv2d_1 (Conv2D)            (None, 26, 26, 32)        73760      _________________________________________________________________ flatten_1 (Flatten)          (None, 21632)             0          _________________________________________________________________ dense_1 (Dense)              (None, 2)                 43266      ================================================================= Total params: 1,852,514 Trainable params: 555,328 Non-trainable params: 1,297,186 _________________________________________________________________

So you can see that we got 555,328 parameters that can be trained or in another for weight updation in backpropagation and 1,297,186 parameters will kept as same as VGG16 model was. So that’s it! Now you know Transfer Learning🎉. check out the below link for complete code script and don’t forget to try it to yourself and try it ALOT.

Jairus313/TransferLearning

Contribute to Jairus313/TransferLearning development by creating an account on GitHub.

github.com

Keep Learning🤓💻….!

“Transfer Learning in Neural Network” in nutshell.

Jairus313/TransferLearning

Contribute to Jairus313/TransferLearning development by creating an account on GitHub.

Written by Sudeep Nellur