Image similarity model

Chaitanyanarava

Follow

Published in

Analytics Vidhya

6 min readDec 17, 2020

--

Finding top N similar images on a given query image from a dataset using clustering and convolutional autoencoders!!!

http://lear.inrialpes.fr/people/nowak/similarity/sameordifferent.png

Can you spot the difference between the two images given below?

https://i.ytimg.com/vi/3Bm5gINwIFc/maxresdefault.jpg

Business Problem:

Have you ever played this game in your childhood where they will give two images and ask us to spot differences between them? Our problem statement is sort of similar to this where we need to find top N similar images given on a query image from a given dataset. Something like this…

Reverse Image search / Image Similarity Model:

What is Image search?

A similar Image search is a kind of search in which we upload or give an image from a dataset and it will output top N similar images from that dataset.

How does it help in Real-world

To find similar images.
Finding Plagiarized Photos.
Creating Backlink opportunities.
To discover people, places, and products.
Finding more versions.
Detecting Fake accounts.

Splitting the Data:

Splitting up the data is mainly useful for the hyperparameter tuning part of machine learning. As every task of ML/DL plays a key role in model training and to make our model fairly well on test data, It is important to tune model hyperparameters.

And for that task, we need data that is often taken from train data in a small portion like 1–2% based on the size of training data and can be referred to as cross-validation data or simply validation data.

Splitting the data into 85:15 ratio

Taking the file path of all the images and splitting that list into train and test. So that there will be no need of splitting the data again in the future and we can access those sets directly by storing them in the drive. Storing the result into .csv format so that there will be no data leakage problems.

Reading Images (Image to Array Conversion) :

Performing the following series of actions by reading the images from both train and test datasets:

Reading images using open cv2 module.
Converting images from BGR (Blue, Green, red) into RGB (Red, Green, Blue)
Resizing image shape from (512,512,3) into (224,224,3).
Normalizing the data.

image to array conversion

We have nearly ~5K images with 512x512 resolution gives ~1,310,720,000 pixels. Loading it into RAM and processing each image with every other image will be computationally expensive and may crash the system(GPU or TPU) or it will be computationally very expensive to run a model.

So as a solution we can integrate both convolutional neural networks and Autoencoder ideas for information reduction from image-based data. That would be treated as a pre-processing step for applying to the cluster.

Chapter-2 : Convolutional AutoEncoders:

Convolutional Autoencoders(CAEs) are a type of convolutional neural networks. The main difference between them is CAEs are unsupervised learning models in which the former is trained end-to-end to learn filters and combine features with the aim of classifying their input.

It tries to keep the spatial information of the input image data as they are and extract information gently.

Encoders: Converting input image into latent space representation through a series of convolutional operations. (Left to centroid)
Decoders: It tries to restore the original image from the latent space through a series of upsampling/transpose convolution operations. (centroid to Right) Also known as Deconvolution.

You can read more about convolutional Autoencoders here.

Convolutional AutoEncoder model Architecture

Model parameters:

Training the model with the optimal parameters

Model Performance:

Restorations seem really satisfactory. Images on the left side are original images whereas images on the right side are restored from compressed representation.
The decoded image is much flexible and efficient to work with rather than working with the original image since the compressed representation takes 8 times less space than the original image.

Feature Extraction:

High activation on the mane of a lion.
Nose and lines on tigers.
Dots and lines on the nose for cheetah.
The nose on foxes.

Dark pixels(yellow) indicate the high activation which helps in differentiating with other images. This activation helps in label classification.

Chapter-3: K-Means Clustering:

After getting compressed data representation of all images we hereby can apply the K-Means clustering algorithm to group the images into different clusters. This helps us to label the unlabeled data.

But to visualize the clustering we need to perform dimensionality reduction through T-SNE.

K-Means for clustering and T-SNE for visualization

Clustering data with optimal K value = 6

cheetahs 2. lions 3. snow dogs 4. tigers 5. leopards 6. foxes

Chapter-4 : Similarity model through K-Nearest Neighbors

After clustering the data we got labeled data. Now we can perform the K-NN algorithm to find similar images(Nearest Neighbors).

Decision boundary of KNN with optimal K value = 9

Making Predictions:

Model predictions

Model testing:

Chapter-5 : Conclusions

Prediction Algorithm:

Step-1: Taking either filename or URL and converting that image into an image array.

Step-2: Using that array finding the feature from the intermediate layers of the trained AutoEncoder model.

Step-3: From the extracted features finding the label to which that image belongs using K-Means clustering.

Step-4: Using KNN model finding N similar images using predict images and finally plotting the result

References:

You can check the .ipynb for the full code snippet of this case study in my Github repository.

Follow me for more such articles and implementations on different real-world case studies in data science! You can also connect with me through LinkedIn and Github

I hope you have learned something out of this. There is no end to learning in this area so Happy learning!! The signing of bye :)