Solving the problem of Hollywood Doppelgängers using deep learning and FastAI library

Khushi Pathak
6 min readJul 3, 2020

When you are an established actor in Hollywood, your face is recognized almost everywhere. People approach you at supermarkets and restaurants with ecstatic and breathless “Hey, aren’t you [insert famous person name]? I’m a HUGE fan of your work!” But what if they mistake you for another famous person? Some may take it in good fun, some might not!

Despite having millions and millions of permutations possible in terms of facial features, the millions are still dwarves as compared to the population in billions. Hence, the famous problem of doppelgängers, which is exacerbated when both you and your lookalike are Hollywood stars. I found it incredibly frustrating when staring at pictures of Jaime Pressly and Margot Robbie and not being able to differentiate among them at all. Take a look for yourself if you don’t believe me:

Margot Robbie (R) and Jaime Pressly (L). Credits: boredpanda.com

Don’t tell me “Their eyebrows are different!”. Pressly is 12 years older than Robbie, and the picture of Pressly above is also from the first decade of this century, which was a dark time for eyebrows in general. So I decided to solve this problem once and for all and make a computer find out how to differentiate between two similar-looking actors, using the ultimate weapon of deep learning.

Level 1

Perhaps I should go a little easier on the technology by giving it a much simpler problem: differentiating between the 4 Holy[wood] Chrises (Hemsworth, Evans, Pratt, Pine).

(L-R) Chris Hemsworth, Chris Evans, Chris Pratt, Chris Pine. Credits: parade.com

Since there’s not a lot of datasets dedicated to unraveling the mysteries of famous lookalikes, I had to prepare one on my own, which is quite easy thanks to Google Images and a little JavaScript magic.

  1. Go to Google Images and search for relevant images. For example, I used "chris hemsworth" -evans -pratt -pine to ensure I was getting only one of the four in any image. Scroll down until you have seen enough images.
  2. Open up the JavaScript console on your browser using Cmd + Opt + C on Mac and Ctrl + Shift + J on Windows. Enter the following lines on the console:
urls=Array.from(document.querySelectorAll('.rg_i')).map(el=> el.hasAttribute('data-src')?el.getAttribute('data-src'):el.getAttribute('data-iurl'));
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
  1. A CSV file will be downloaded containing all the URLs for the images you just saw. Save this on your computer with a relevant file name.
  2. Use download_images(path/file, dest, max_pics = 200) to download a maximum of 200 images from your CSV file (pointed to by path/file) and saved inside the dest location.

Now that your data is ready to be deployed, we use fastAI library to import our data into an ImageDataBunch object:

The dataset has been split 80:20 into training and development sets, with all images resized to 224 x 224, which is good enough. Let’s now glance at our images:

Fair enough. Let’s proceed with training the model and see if how successful it is in differentiating between the four Chrises. I will be using a ResNet50 model, pre-trained on the ImageNet dataset, to transfer learn on my image dataset.

Running the model for 4 epochs with learning rate restricted between 1e-4 and 1e-3 gives us a pretty impressive accuracy of about 82%, considering most of the images we had of the actors also had other people in them, which may have confused the model a little.

The confusion matrix for the Chris classifier.

I believe the model performed pretty well considering the limited dataset it trained on. The model made most of its mistakes on pictures of Chris Pratt which were mislabeled as Chris Hemsworth. Understandable, post-Marvel Pratt looks very similar to post-workout Hemsworth, and worlds apart from pre-Marvel Pratt.

Now that we’ve conquered Level 1, it’s time to move on to the real stuff.

Level 2

Our model is ready to handle the tough stuff. Let’s go beyond human abilities, and try to differentiate between Isla Fisher and Amy Adams.

Amy Adams (L) and Isla Fisher (R). Credits: Vulture.com

To be honest, my expectations are at rock bottom for this case. Isla Fisher famously trolled all her relatives by putting Amy Adams’ picture instead of hers on holiday cards, and no one noticed the difference. If her relatives cannot make out between who’s who, then how can our innocent model? Let’s find out.

I created a dataset of images using the same steps as before and trained the model all over again. Keep in mind that the image dataset also contains several shots from characters that the two actresses played in movies, which should aid the model in learning about facial features from different angles and light conditions. The model may have a tough time learning on our data, but let’s keep up the optimism. A glance at the dataset gives us this:

Let’s train our ResNet50 model again from start on the new dataset, using 4 epochs and a learning rate in the range of 1e-4 to 1e-3:

A whopping 89% accuracy when classifying between Amy Adams and Isla Fisher! Perhaps their relatives should use deep learning next time the holiday cards are being circulated?

I’d say that the model performs quite well when learning to differentiate between the two. Let’s have a look at the instances where the model made its biggest mistakes, i.e.made the wrong guesses with high probabilities.

Prediction/Actual/Loss/Probability

I’m quite sure that the first two images are of Nicole Kidman and Emma Stone, so the faulty prediction is not the model’s fault. For the rest, the losses are all quite low, indicating that the model could have performed better with a larger and better dataset.

Conclusion

This experimental project worked very well, and very quickly, using FastAI’s well-developed library. With only a few lines of code, we managed to get a pretty accurate deep learning model. Perhaps with a much more expansive dataset, this model could be trained to produce state-of-the-art results.

Thank you for reading this. I wholeheartedly welcome any feedback or constructive criticism. If you liked this little article, do give it a clap!

--

--

Khushi Pathak

Machine learning enthusiast | Math & Computing student @ IIT Delhi | khushipathak.github.io