The transfer learning experience with VGG16 and Cifar 10 dataset

Paulo Morillo
Analytics Vidhya
Published in
5 min readJul 3, 2020
Taken from http://www.thebluediamondgallery.com/wooden-tile/t/transfer.html Nick Youngson

Abstract

In this blog, I’m going to talk about how I have gotten an accuracy greater than 88% (92% epoch 22) with Cifar-10 using transfer learning, I used VGG16 and I applied a very low constant learning rate and I implemented the function upsampling to get more data points for processing

Introduction

Nowadays we are having a very good time for machine learning, we have a lot of famous models with great results that make predictions fast and with high accuracy. Consequently, we should use those tools to apply in our daily predictions focusing on the goals of our models and not only in the footprint of it. For this reason, we need to understand our dataset and try to apply the correct model, doing the necessary preprocessing of the dataset and the corrections in those famous model if it’s necessary.

Materials and Methods

In this practice I used:

Keras 1.12

Colab using GPU: For me is the best option (cost-effective) that I have seen to compile and train a model. It’s Jupyter saving in drive or uploading to GitHub.

VGG16 model: I have chosen this model because I thought in the time that I spent if I used a deeper model like dense121 or resnet50 and the accuracy of this model is not bad and the results in this practice were very nice, I compared with dense121 and the accuracy difference between them is only 0.08%.

Taken from https://keras.io/api/applications/

Cifar 10 dataset: “consists of 60000 32x32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.”

Upsampling2D: Method applied to take more data points of each image

Freeze all VGG16 model: I tried to get more accuracy tunneling some layers but the time of training increased a lot and the results were almost the same.

Constant learning rate: I tried to use a learning rate decay but the results were not so good, I’m going to talk about later.

Results

The results applying the VGG16 model adding two layers and with a constant learning

Model Summary

With learning rate decay:

With a constant learning rate:

we can see that I get 92.05% with a constant learning rate instead of 80.9% using learning rate decay. I added 2 layers with ReLU activation and 1 layer for softmax.

Discussion

The most important for me is the implementation of a very low constant learning rate, probably this is caused because the model is trained with “imagenet” and the steps to apply gradient descent shouldn’t be big because maybe we can enter in a zone that is not the real minimum value (see the image, the model should be trying to get the minimum value, but in some cases could get stuck in a low point that is not the minimum value, we can see that only one point is trying to go down) another important point is the preprocessing because cifar 10 has images with low resolution and we can not take a lot of points from them, for this reason, upsampling help a lot to improve the accuracy.

Bibliography

Andrew NG video https://www.youtube.com/watch?v=FQM13HkEfBk&index=20&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF

Santiago VG https://medium.com/@svelez.velezgarcia/transfer-learning-ride-fa9f2a5d69eb

Keras applications https://keras.io/api/applications/

Appendices

"""This script has the methodpreprocess_data(X, Y): and decayand use transfer learning with VGG16 model"""import tensorflow.keras as Kimport datetimedef preprocess_data(X, Y):
""" This method has the preprocess to train a model """
# applying astype to change float64 to float32 for version 1.12# X = X.astype('float32')#using preprocess VGG16 method by default to scale images and their valuesX_p = K.applications.vgg16.preprocess_input(X)# changind labels to one-hot representationY_p = K.utils.to_categorical(Y, 10)return (X_p, Y_p)def decay(epoch):""" This method create the alpha"""# returning a very small constant learning ratereturn 0.001 / (1 + 1 * 30)if __name__ == "__main__":# loading data and using preprocess for training and validation dataset(Xt, Yt), (X, Y) = K.datasets.cifar10.load_data()X_p, Y_p = preprocess_data(Xt, Yt)Xv_p, Yv_p = preprocess_data(X, Y)# Getting the model without the last layers, trained with imagenet and with average poolingbase_model = K.applications.vgg16.VGG16(include_top=False,weights='imagenet',pooling='avg',input_shape=(32,32,3))# create the new model applying the base_model (VGG16)model= K.Sequential()# using upsamplign to get more data points and improve the predictionsmodel.add(K.layers.UpSampling2D())model.add(base_model)model.add(K.layers.Flatten())model.add(K.layers.Dense(512, activation=('relu')))model.add(K.layers.Dropout(0.2))model.add(K.layers.Dense(256, activation=('relu')))model.add(K.layers.Dropout(0.2))model.add(K.layers.Dense(10, activation=('softmax')))# adding callbackscallback = []callback += [K.callbacks.LearningRateScheduler(decay, verbose=1)]#callback += [K.callbacks.ModelCheckpoint('cifar10.h5',# save_best_only=True,# mode='min'# )]# tensorboard callback# log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")# callback += [K.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)]# Compiling model with adam optimizer and looking the accuracymodel.compile(optimizer='adam', loss='categorical_crossentropy',metrics=['accuracy'])# training model with mini batch using shuffle datamodel.fit(x=X_p, y=Y_p,batch_size=128,validation_data=(Xv_p, Yv_p),epochs=30, shuffle=True,callbacks=callback,verbose=1)

https://github.com/PauloMorillo/holbertonschool-machine_learning/blob/master/supervised_learning/0x09-transfer_learning/0-transfer.py

--

--

Paulo Morillo
Analytics Vidhya

Fullstack developer and sound engineer, learning ML