Image Similarity Comparison using VGG16 Deep Learning Model

Roman
5 min readFeb 20, 2023

--

Figure 1: The architecture of VGG16. Source: Researchgate.net

VGG16 is a powerful pretrained model that can be used for identifying similarities between images. By using this model, we can extract high-level features from different images and compare them to identify similarities. This technique has a wide range of applications, from image search and recommendation systems to security and surveillance.

In this article, I will make use of the model to find similarity between two images.

Setting up the Environment for the Image Similarity Project

For this project, we will be leveraging popular machine learning libraries such as keras and scikit-learn for building and training our image similarity model.

In addition to the libraries mentioned, we will also be using numpy and matplotlib libraries for data manipulation and visualization, respectively. These libraries will be useful for preparing the image data and visualizing the results of our image similarity model.

import numpy as np 
from PIL import Image
from tensorflow.keras.preprocessing import image

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

from keras.applications.vgg16 import VGG16
from sklearn.metrics.pairwise import cosine_similarity

Configuring the VGG16 Model for Image Embedding Extraction and understanding its parameters

vgg16 = VGG16(weights='imagenet', include_top=False, 
pooling='max', input_shape=(224, 224, 3))

# print the summary of the model's architecture.
vgg16.summary()

The weights='imagenet' parameter specifies that the model should be initialized with pre-trained weights from the ImageNet dataset, a large dataset of labeled images used for training computer vision models.

The include_top=False parameter indicates that the top dense layers of the model, which are responsible for classification, should not be included. This is done when we want to use the pre-trained model as a feature extractor, and then add our own custom classification layers on top of it.

The parameter pooling='max' operation is useful for reducing the size of the feature maps while retaining the most important information.

The input_shape=(224, 224, 3) parameter specifies the expected shape of the input images, which in this case are 224x224 color images.

Freezing the VGG16 Model Layers for Transfer Learning

for model_layer in vgg16.layers:
model_layer.trainable = False

For each layer in the model, we need to specify that we don’t need additional training. We will instead use the pre-set parameters of the VGG16 model, which was trained by default with the ImageNet dataset.

Defining the functions for Preprocessing the Image Data for Model Input

def load_image(image_path):
"""
-----------------------------------------------------
Process the image provided.
- Resize the image
-----------------------------------------------------
return resized image
"""

input_image = Image.open(image_path)
resized_image = input_image.resize((224, 224))

return resized_image

This function takes an image file path as input, represented by the image_path parameter and loads the image from disk using the Image.open() method from the PIL (Python Imaging Library) module.

The function then resizes the image to a fixed size of (224, 224) pixels using the resize() method. This is a common preprocessing step used in deep learning image models, where images are typically resized to a fixed input size before being fed to the model.

Finally, the function returns the resized image as a PIL image object, which can be further processed or used as input to a machine learning model.

def get_image_embeddings(object_image : image):

"""
-----------------------------------------------------
convert image into 3d array and add additional dimension for model input
-----------------------------------------------------
return embeddings of the given image
"""

image_array = np.expand_dims(image.img_to_array(object_image), axis = 0)
image_embedding = vgg16.predict(image_array)

return image_embedding

This function takes an image object as input and converts the image to a 3D array with the img_to_array method from the Keras image module.

The resulting array is then expanded to have an additional dimension using the np.expand_dims() method, which is required for the VGG16 deep learning model input. The expanded array represents a single image with shape (1, height, width, channels), where height, width, and channels correspond to the dimensions of the image.

The function then calls the predict() method on the VGG16 model, which has been previously defined in the code. This method takes the expanded numpyarray as input and generates an embedding for the image using the pre-trained weights of the VGG16 model.

Finally, the function returns the image embedding as a numpyarray. This embedding can be used as a feature representation of the input image, which can be used for tasks such as image retrieval, similarity search, or classification

def get_similarity_score(first_image : str, second_image : str):
"""
-----------------------------------------------------
Takes image array and computes its embedding using VGG16 model.
-----------------------------------------------------
return embedding of the image

"""

first_image = load_image(first_image)
second_image = load_image(second_image)

first_image_vector = get_image_embeddings(first_image)
second_image_vector = get_image_embeddings(second_image)

similarity_score = cosine_similarity(first_image_vector, second_image_vector).reshape(1,)

return similarity_score

This function takes the file paths of two images as input, represented by the first_image and second_image parameters. The function first calls the load_image() function that loads the image from disk.

The function then calls the get_image_embeddings() function on each image, which generates a feature embedding for each image using the VGG16 model. These embeddings are represented as numpy array.

Finally, the function computes the cosine similarity score between the two embeddings using the cosine_similarity() function from the sklearn.metrics.pairwise module. The resulting similarity score is a single value between -1 and 1 that measures the degree of similarity between the two images, with a score of 1 indicating perfect similarity.

def show_image(image_path):
image = mpimg.imread(image_path)
imgplot = plt.imshow(image)
plt.show()

This function takes in image path as input and uses the plt.imshow() method from the matplotmodule to display the image array as a 2D plot.

Implementing the VGG16 model

Finally, we will use the functions defined above along with the VGG16 model initialized to find the similarity between images.

# define the path of the images
sunflower = '/content/sunflower.jpeg'
helianthus = '/content/helianthus.jpeg'

tulip = '/content/Tulip.jpeg'

# use the show_image function to plot the images
show_image(sunflower), show_image(helianthus)
Sunflower
Helianthus
similarity_score = get_similarity_score(sunflower, helianthus)
similarity_score
Output
show_image(sunflower), show_image(tulip)
Sunflower
Tulip
similarity_score = get_similarity_score(sunflower, tulip)
similarity_score
Output

In this project, we explored the use of the VGG16 model to extract features from images and compute similarity scores between them. We began by importing the necessary libraries, initializing the VGG16 model, and defining functions for loading and processing images, computing their embeddings, and calculating similarity scores.

We then applied these functions to a set of images and visualized the results using various techniques. Overall, the project demonstrated how deep learning models like VGG16 can be leveraged to perform complex image analysis tasks and generate insights from visual data.

--

--