Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Creating the Snapchat Filter System using Deep Learning

7 min readDec 25, 2018

--

If y’all don’t like to read

Welcome, all the millennial programmers who may have opened this article after seeing the words ‘Snapchat’ and ‘Deep Learning’. I swear to god, these 2 words attract you guys like a moth to a flame. Who am I kidding, I fell prey to it too, that’s why I spent hours making this project.

In this article, if you’d like to call it that, I will be going over the process and a little bit of the theory behind the project in the title. Full disclosure, even my usage of the term ‘Snapchat’ in the title might have been a little clickbaity because even though this project works on the same principle(Using facial key points to map objects to a face), it’s not even close to Snapchat’s implementation in terms of complexity and accuracy. With that out of the way, let me start by introducing the dataset I used.

The Dataset

The dataset I used is the following: https://www.kaggle.com/c/facial-keypoints-detection provided by Dr. Yoshua Bengio of the University of Montreal.

Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 key points, which represent the different elements of the face. The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.

Now that we have a good idea about the kind of data we are dealing with, we need to preprocess it so that we can use it as inputs to our model.

Step 1: Data Preprocessing and other shenanigans

The above dataset has two files that we need to concern ourselves with — training.csv and test.csv. The training file has 31 columns: 30 columns for the key point coordinates, and the last column containing the image data in a string format. It contains 7049 samples, however, many of these examples have ‘NaN’ values for some key points which make things tough for us. So we shall only consider the samples without any NaN values. Here’s the code that does exactly that: (The following code also normalizes the image and keypoint data, which is a very common data preprocessing step)

Everything well and good? Not really, no. It seems that there were only 2140 samples that didn’t contain any NaN values. These are way fewer samples to train a generalized and accurate model. So to create more data, we need to augment our current data.

Data Augmentation is a technique used to generate more data from existing data, by using techniques like scaling, translation, rotation, etc. In this case, I mirrored each image and its corresponding key points, because techniques like scaling and rotation might have distorted the face images and would have thus screwed up the model. Finally, I combined the original data with the new augmented data to get a total of 4280 samples.

Step 2: Model architecture and Training

Now let’s dive into the Deep Learning section of the project. We aim to predict coordinate values for each key point for an unseen face, hence it’s a regression problem. Since we are working with images, a Convolutional Neural Network is a pretty obvious choice for feature extraction. These extracted features are then passed to a fully connected neural network which outputs the coordinates. The final Dense layer needs to 30 neurons because we need to 30 values(15 pairs of (x,y) coordinates).

  • ‘ReLu’ activations are used after each Convolutional and Dense layer, except for the last Dense layer since these are the coordinate values we require as output
  • Dropout Regularization is used to prevent overfitting
  • Max Pooling is added for Dimensionality Reduction

The model was able to reach a minimum loss of ~0.0113, and accuracy of ~80%, which I thought was decent enough. Here are a few results from the model performance on the test set:

Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size
Model performance on the Test set

I also needed to check the model’s performance on an image from my webcam, because that is what the model would receive during the filter implementation, here’s how the model performed on this image of my beautiful face:

Don’t be intimidated by this scary face. I don’t bite.

Step 3: Put the model into action

We got our model working, so all we gotta do now is use OpenCV to do the following:

  1. Get image frames from the webcam
  2. Detect region of the face in each image frame because the other sections of the image are useless to the model (I used the Frontal Face Haar Cascade to crop out the region of the face)
  3. Preprocess this cropped region by — converting to grayscale, normalizing, and reshaping
  4. Pass the preprocessed image as input to the model
  5. Get predictions for the key points and use them to position different filters on the face

I did not have any particular filters in mind when I began testing. I came up with the idea for the project around 22 December 2018, and being a huge Christmas fanboy like any other normal human being, I decided to go with the following filters:

Press enter or click to view image in full size
Press enter or click to view image in full size
Filters

I used particular key points for the scaling and positioning of each of the above filters:

  • Glasses Filter: The distance between the left-eye-left-keypoint and the right-eye-right-keypoint is used for the scaling. The brow-keypoint and left-eye-left-keypoint are used for the positioning of the glasses
  • Beard Filter: The distance between the left-lip-keypoint and the right-lip-keypoint is used for the scaling. The top-lip-keypoint and left-lip-keypoint are used for the positioning of the beard
  • Hat Filter: The width of the face is used for the scaling. The brow-keypoint and left-eye-left-keypoint are used for the positioning of the hat

The code which does all the above is as follows:

Result

Press enter or click to view image in full size

Above, you can see the final output of the project, which contains a real-time video with filters on my face and another real-time video with key points plotted.

Limitations of the project

Although the project works pretty well, I did discover a few shortcomings which make it a little shy of perfect:

  • Not the most accurate model. Although 80% is pretty decent in my opinion, it still has a lot of room for improvement.
  • This current implementation works only for the selected set of filters because I had to do some manual tweaking for more accurate positioning and scaling.
  • The process of applying the filter to the image is pretty computationally inefficient because to overlay the .png filter image onto the webcam image based on the alpha channel, I had to apply the filter pixel-by-pixel wherever the alpha was not equal to 0. This sometimes leads to the program crashing when it detects more than one face in the image.

The complete code for the project is on my Github: https://github.com/agrawal-rohit/Santa-filter-facial-keypoint-regression

If you’d like to improve upon the project, or if you have any suggestions for me to solve the above issue, be sure to leave a response below and generate a pull request on the Github repo. Thanks for stopping by, hope you enjoyed the read.

Ciao!

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Rohit Agrawal
Rohit Agrawal

Written by Rohit Agrawal

Software Engineer @ FordPro Charging. Passionate about Programming, Guitar, and Product development — Work with me: https://rohit.build

Responses (2)