Transfer Learning

Cristian
5 min readSep 24, 2020

--

Abstract

Most machine learning algorithms are designed to address single tasks, the development of algorithms that facilitate transfer learning is a topic of ongoing interest in the machine learning community.

In this experiment I will explain the procedure, the methods that I used and at the end the results that I obtained.

Introduction

When we want to start a new image classification project with CNN, we have the problem of not having enough data, this is one of the great difficulties when starting a model in addition to training new networks is time consuming and requires a lot of GPU consumption, one of the solutions that are used the most today is Transfer learning.

Transfer learning gives us the ability to leverage the power of having a large dataset without having to retrain a new model from scratch. There needs to be some training done but this is mainly due to the part of adding in our new dataset. The idea behind Transfer learning is to use a pre-trained network that has been trained on a large enough image dataset that can act as a generic model of the world around us. We can then use this trained network on the images that we want to classify, tweak the model, and run our new architecture to see the classification results that we are looking for.

There are different types of networks such as VGG16, VGG19, MobileNet, etc, each one has its advantages and disadvantages. For the purpose of this experiment, our chosen architecture is ResNet50 since it has advantages of maintaining the characteristics of its previous layers and using it as a pre-trained base model to later use that generated knowledge and implement it to a new network.

Dataset

For training we will use the CIFAR-10 dataset provided by the KERAS DATASETS library. CIFAR-10 was chosen for this very purpose and due to the fact that it contains a large amount of images that span over 10 classes (10 possible outcomes).

The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images which be leveraged in this scenario when we train and test our model.

Pre-processing

Because the dataset contains images of 32 x 32 pixels, we need to process the images so that they can be supported by the ResNet50 architecture at 200x200 pixels as a minimum, for my training I scale them to 224 x 224 pixels, in Keras you can use Upsampling, or any other resizes method.

Model Architecture

ResNet50

After the images are ready for the ResNet50 layers, we can pass through our images and then take the output, flatten it, and pass it do a fully connected network consisting of three hidden layers, one with 256 neurons, 128 neurons and the other with 64 neurons. These three layers consist of a BatchNormalization layer before the actual layer and a dropout layer for the output, the dropout layer consisted of a probability of 0.5. Lastly, we have a final dense layer as the output with 10 neurons and a softmax output for the 10 classes that exist in CIFAR10. Softmax will essentially give us the probability of each class, in this case the 10 outcomes, which should all sum up to equal 1.

The BatchNormalization is used to stabilize the data and increase the speed of learning, and the dropout works to deactivate some neurons in each layer helping the model not to overfit and to be able to generalize the data more.

Results

image 1 training
results of training

The training was executed in 10 epochs, the average training time was 1 hour knowing that this architecture has many layers, a possible improvement could be, implementing the layer freezing this would improve the training time a lot, but one of the disadvantages is that the model can generate overfiting . Seeing the results in the image 1, the loss and the validation did not have much difference, this indicates that the model was good, obtaining a maximum accuracy of 95% and a minimum loss of 0.12, however it is noticeable at the end of the times that the model began to increase the loss and decrease the accuracity this means that it was not learning but that it was stagnating.

After training, the model ran the test set and obtained a result of 95%, meaning that it predicts each image with 95% accuracy.

Conclusion

There are also many techniques to improve the convergence of pre-trained models such as layer freezing, this helps so that the data of the already trained model cannot be modified and these characteristics can be used with the new model, but there is also the fine-tune that consists of defrosting the layers of the pre-trained model and being able to combine the new weights with it.

Transfer learning is a very helpful tool when it comes to implementing a new network but it lacks data, there are many models that have already been pre-trained generating help above all in saving time and allowing developers to focus on their new cases.

References

--

--