Which Celebrity Do You Look Like? An Image Similarity Search Model

Published in

The Startup

5 min readAug 8, 2020

Image source: one1note | The Associated Press

People often try to look like a celebrity they like the most. They always seem to love it when they get compliments like…

“Hey, you look just like that guy from the Interstellar movie, OMG, what’s his name…??? Yes, MATTHEW MCCONAUGHEY!!”

Meanwhile…

Maybe your friends were just being polite to make you feel good, but is there a way to objectively identify which celebrity you resemble the most?

In this post, I’ll use a Siamese Neural Network approach to calculate a similarity percentage(0–100%) between two images by comparing facial features.

INTRODUCTION

Comparing images for similarities has far and wide-reaching applications. Some use cases for matching images can be found in the field of healthcare, where radiology reports could be compared with stock images of some medical condition in order to assist doctors in the diagnosis of diseases.

In retail where an image of a product might be searched on an e-commerce website for price and availability details and in general search applications for looking up an image in a database of images. Such a task is far from trivial as images of similar objects might appear different due to differences in camera equipment, lighting conditions, image orientation, color, and resolution.

There have been many algorithms that have been proposed for matching images, such as SIFT, Pyramid Match among others. Recent advancements in Convolutional Neural Networks (CNNs) have enabled direct comparison of images without the use of handcrafted features that were earlier used to compare images.

Siamese networks are typically used in this domain, they are a special type of neural network architecture. Instead of a model learning to classify its inputs, the Siamese networks learn to differentiate between two inputs. It learns the similarity between them.

DATASET

The dataset used for this exercise is taken from FaceScrub, which is a face dataset built by detecting faces in images returned from searches for public figures on the Internet, followed by automatically discarding those not belonging to each queried person. It comprises a total of over 100,000 face images of male and female 530 celebrities, with about 200 images per person. The name and gender annotation and the pixel values of the faces of celebrities inside the images are also included in this dataset.

DATA PREPROCESSING

Discard all black and white images, and keep on the colored images, for consistency.
For the sake of simplicity (and because of the limited computation power I had), only 20 images per celebrity were used for training.
Crop out the faces of celebrities from the images.
Resize all the images to a 150X150 resolution (RGB images) for a consistent input size.
Randomly applied a contrast limited adaptive histogram equalization algorithm to the images to improve the contrast.
Added additional augmented images on the fly while training the network to reduce overfitting on the training set. The images were rotated randomly from -30 to 30 degrees and flipped.

MODEL — SIAMESE NEURAL NETWORK

A Siamese neural network (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors (Wiki).

This diagram illustrates how a Siamese neural network works. The two sister networks are identical and share the hyperparameters and weights. Each sister network of the siamese network is fed with a different image and the neural network is trained using triplet loss or contrastive loss. The loss is calculated using the ground truth, i.e. whether the images are similar or not.

The Sister network used here is a SqueezeNet which is an 18-layer Neural network consisting of Convolution, Normalization, Pooling, and ReLU activation layers. In addition to that, it also consists of 8 Fire Nodes as illustrated below.

Finally, since the objective of this solution is not to classify but to differentiate between images, a Contrastive Loss Function is used that distinguishes between the input pair of feature vectors. The following is the formula for contrastive loss.

where **X1, X2** are the two input images to the network, Gw is the transformation of the image to the feature vector and Dw is the Euclidean Distance between the two feature vectors.

Here is what the final Siamese model training looks like

In the training example above, two celebrity images, Bruce Lee and Jackie Chan, are fed individually to the sister networks. The two output vectors at the end of the Sister networks are then used to compute the contrastive loss, based on the ground truth that these two images are not from the same person. This loss value is then backpropagated to both the sister networks to update their weights equally. Similarly, two images of the same actor will also be fed to the model to help it understand the similarities along with the dissimilarities.

OUTPUT

After uploading a new image, the model returns a list of probability scores calculated against other celebrity images, indicating how similar the two images are. Here is a sample output of celebrity look-alikes for Matt Damon.