Machine Learning is Fun! Part 4: Modern Face Recognition with Deep Learning

Facebook automatically tags people in your photos that you have tagged before. I’m not sure if this is helpful or creepy!
One of these people is Will Farrell. The other is Chad Smith. I swear they are different people!

How to use Machine Learning on a Very Complicated Problem

So far in Part 1, 2 and 3, we’ve used machine learning to solve isolated problems that have only one step — estimating the price of a house, generating new data based on existing data and telling if an image contains a certain object. All of those problems can be solved by choosing one machine learning algorithm, feeding in data, and getting the result.

  1. First, look at a picture and find all the faces in it
  2. Second, focus on each face and be able to understand that even if a face is turned in a weird direction or in bad lighting, it is still the same person.
  3. Third, be able to pick out unique features of the face that you can use to tell it apart from other people— like how big the eyes are, how long the face is, etc.
  4. Finally, compare the unique features of that face to all the people you already know to determine the person’s name.
How a basic pipeline for detecting faces might work

Face Recognition — Step by Step

Let’s tackle this problem one step at a time. For each step, we’ll learn about a different machine learning algorithm. I’m not going to explain every single algorithm completely to keep this from turning into a book, but you’ll learn the main ideas behind each one and you’ll learn how you can build your own facial recognition system in Python using OpenFace and dlib.

Step 1: Finding all the Faces

The first step in our pipeline is face detection. Obviously we need to locate the faces in a photograph before we can try to tell them apart!

Looking at just this one pixel and the pixels touching it, the image is getting darker towards the upper right.
The original image is turned into a HOG representation that captures the major features of the image regardless of image brightnesss.

Step 2: Posing and Projecting Faces

Whew, we isolated the faces in our image. But now we have to deal with the problem that faces turned different directions look totally different to a computer:

Humans can easily recognize that both images are of Will Ferrell, but computers would see these pictures as two completely different people.
The 68 landmarks we will locate on every face. This image was created by Brandon Amos of CMU who works on OpenFace.
PROTIP: You can also use this same technique to implement your own version of Snapchat’s real-time 3d face filters!

Step 3: Encoding Faces

Now we are to the meat of the problem — actually telling faces apart. This is where things get really interesting!

Just like TV! So real! #science

The most reliable way to measure a face

Ok, so which measurements should we collect from each face to build our known face database? Ear size? Nose length? Eye color? Something else?

  1. Load a training face image of a known person
  2. Load another picture of the same known person
  3. Load a picture of a totally different person

Encoding our face image

This process of training a convolutional neural network to output face embeddings requires a lot of data and computer power. Even with an expensive NVidia Telsa video card, it takes about 24 hours of continuous training to get good accuracy.

Step 4: Finding the person’s name from the encoding

This last step is actually the easiest step in the whole process. All we have to do is find the person in our database of known people who has the closest measurements to our test image.

Sweet, sweet training data!

Running this Yourself

Let’s review the steps we followed:

  1. Encode a picture using the HOG algorithm to create a simplified version of the image. Using this simplified image, find the part of the image that most looks like a generic HOG encoding of a face.
  2. Figure out the pose of the face by finding the main landmarks in the face. Once we find those landmarks, use them to warp the image so that the eyes and mouth are centered.
  3. Pass the centered face image through a neural network that knows how to measure features of the face. Save those 128 measurements.
  4. Looking at all the faces we’ve measured in the past, see which person has the closest measurements to our face’s measurements. That’s our match!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Geitgey

Adam Geitgey

55K Followers

Interested in computers and machine learning. Likes to write about it.