CNN, Transfer Learning with VGG-16 and ResNet-50, Feature Extraction for Image Retrieval with Keras

Ferhat Taş
Analytics Vidhya
Published in
5 min readJan 9, 2021

In this article, we are going to talk about how to implement a simple Convolutional Neural Network model firstly. Then we are going to implement Transfer Learning models with VGG-16 and ResNet-50. Lastly we are going to extract features from those Transfer Learning models for Image Retrieval. We will train models with CINIC-10 dataset and use Keras library to implement and train each models.

Import Necessary Libraries and Get Datasets with ImageDataGenerator

Firstly, we should import necessary libraries and get dataset with ImageDataGenerator like below code. We should give directories of train, validation and test dataset as parameter when using ImageDataGenerator. We have 90000 image in each datasets and we have 10 classes in CINIC-10 dataset. So we chose categorical as class_mode. Our batch_size is 64 and number of epochs is 10 for training models. Input shape for all images is (224, 224, 3) and we normalize all images with rescale=1./255 as parameter of ImageDataGenerator.

Get Datasets with ImageDataGenerator

Implement a CNN Model with Keras

We can simply implement a Convolutional Neural Network model with below function. There are some parameters of this function:

  • num_of_layers: number of Conv2d layers in model
  • num_of_filters: number of Conv2d layers
  • filter_size: filter size of Conv2d layers
  • initializer: kernel_initializer of layers
  • activation_function: activation function of layers
  • dropout: dropout of layers
  • opt: optimizer of layers
Implement a Simple CNN Model

With using above function to create CNN models there are Test Accuracy results for some combination of hyper-parameters after training the models. We can see the results for 10 epochs in below bar plot.

Test Accuracies of Different CNN Models

With looking above bar plot, we can get some generalization for hyper-parameters

  • We can say that we can get better results if we choose (3,3) filter size instead of (5,5) filter size. Because we do more convolution calculations if we use (3,3) filter size.
  • We can also see that dropout effection can changeable according to its value. So we should try dropout value for our models and choose best one.
  • We can say that if we have 3 convolution layers in our models we can get better results compare to models have 2 convolution layers.

Transfer Learning with VGG-16 and ResNet-50

For transfer learning of VGG-16 and ResNet-50 we can use below functions. In this functions we will create models without last classification layer and add our fully connected layer which has 1024 neuron.

We have one parameter in this function which is lastFourTrainable. If this parameter of function is false; then just last fully connected layer of models will be trainable. But if this parameter is true; then last four layers of models which have parameters will be trainable.

Transfer Learning with VGG-16 and ResNet-50

With using above functions to create Transfer Learning models of VGG-16 and ResNet-50, there are Test Accuracy results for 4 combination of those models. We can see the results for 10 epochs in below bar plot.

Test Accuracies of Different Transfer Learning Models

We can see that VGG-16 Transfer Learning Model with lastFourTrainable=True give us the best results compare to other Transfer Learning Models. Also we can say that if we increase number of trainable layers, we can get better results in all models.

We can also see the Confusion Matrix of our best Transfer Learning model below.

Confusion Matrix for VGG-16 Transfer Learning Model (Last Four Layers are Trainable)

Image Retrieval with Feature Extraction using Transfer Learning Models

Feature Extraction in deep learning models can be used for image retrieval. We are going to extract features from VGG-16 and ResNet-50 Transfer Learning models which we train in previous section. So we have 4 model weights now and we are going to use them for feature extraction.

For extracting features we are going to use output before classification layer of models. For example for VGG-16 model;

  • We will firstly get weights of model from saved file.
  • Then we will get the output weights before classification layer of this model.
  • After model is ready, we are going to get feature vectors for train and validation datasets as DataFrame and save it as pickle file.
  • Now, we are ready for image retrieval. In this section, we will give a image and compare its feature vector with all feature vectors. Then we will get first 5 similar images for this image.

We can see the codes for feature extraction in below.

Image Retrieval with Feature Extraction

In the above code there are some functions:

getFeatureVector(model, img_path): This function will find the feature vector for given img_path with using given model and return this feature vector.

getCosineSimilarity(model, img_path): This function will find the Cosine Similarity between given A and B feature vectors.

getFeatureDataFrame(model): This function will firstly create a DataFrame with Pandas library which has two columns as ‘file’ and ‘features’. Then we will find all feature vectors for train and validation datasets and return those feature vectors as DataFrame.

getSimilarImages(img_file, features_df, model, model_name): This function will get feature vector of given image and compare this feature vector with all feature vectors in DataFrame and plot first 5 similar images.

Now, it is time to see results of Image Retrieval with Feature Extraction!

Image Retrieval Results for Given Airplane Image with 4 Different Model

In above table, we can see that VGG-16 models are better than ResNet-50 models for Image Retrieval. Also, we can say that if we increase trainable layers in each models we can get better results.

Summary

In this article, firstly we learned how to implement a simple CNN model and how hyper-parameters can change accuracies for a CNN model.

In second section, we get Test Accuracy results for Transfer Learning models of VGG-16 and ResNet models. According to those results, we can say that VGG-16 models can outperform ResNet-50 models.

Lastly, we saw that VGG-16 models can outperform ResNet-50 models for Image Retrieval.

You can see all codes in GitHub repository.

--

--