Face Recognition in Real Time Using Machine Learning.

7 min readJan 18, 2021

What did it mean when the teacher said, “But first I’m going to take the attendance”?

For a student, this sentence could mean many things. More time to talk to friends, a chance to “fix-up” yesterday’s homework or simply just a time to get settled into class.

Every day, the routine is the same. The teacher calls out each student’s name, checks the box beside the names of the absent students, sends the completed attendance to the school office and then proceeds with the lesson.

What’s the problem with this process? It’s time-consuming. How much time are we wasting when it comes to taking the attendance? It may only seem like five minutes each class, but that quickly adds up to 25 minutes per week. And over one month, that number becomes 100 minutes. 100 minutes that could have been spent teaching, learning or even doing homework.

Besides that, it’s simply inefficient. We live in a digital world. Everything is automated. From self-driving cars to automated kitchens. Almost everything we use in our day-to-day lives is automated, yet our attendance system has stayed the same for centuries.

So, why can’t our attendance systems be automated?

Here’s the exciting part, they can! I know, know, it’s a little weird to be excited about taking the attendance, but once you finish reading this article, you will be too!

Here’s how I made an automatic attendance system that works in real-time:

Machine Learning

Before I get too far, let’s cover some basics. Machine learning is a form of AI and it’s all about getting the machine to learn and improve. All on its own! The goal of machine learning is to create a model that is as accurate as possible, but the cool thing is that it doesn’t require extensive programming.

Machine learning powers the many services that have grown to become an essential part of our life. It powers the recommendation systems on platforms like Youtube and Netflix, that keep you watching for hours. It powers the social media feeds on Instagram and Facebook. It powers our voice assistants, like Google and Alexa. The list is endless. If you’d like to learn more about how ML is used in your voice assistants, check out my article here!

I’m sure you’ve heard that big companies like Google and Facebook like to collect your information. But what do they do with all that data?

One thing they use that data for is to train the ML algorithms to make an accurate guess about what you would like next.

You can think of it like this:

More data = More content to train the machine with = More accuracy

Really, it’s quite simple. Find the pattern. Use the pattern. But this process has become the very basis of most AI.

Facial Recognition

Facial recognition is a system that is used to identify a person from a video or image. Facial recognition powers the algorithms that can recognize your friends’ faces after they have been tagged only a few times on Facebook. You may have also noticed a yellow box around a face before you go to take a picture of someone with your phone. Yes, that too is happening because of facial recognition.

How Does it Work?

Now that we have a basic definition of facial recognition, we can get into the theory part of things. Understanding how it works.

1. Converting the Image to a HOG Image:

Many algorithms today will do this on their own. So, we do not have to convert an image to a HOG image manually, in most cases.

Making the picture black and white.

We don’t need colour to find the face. In fact, it can be harder for the algorithm when there is colour involved. So, it drains the colour.

Bill Gates’ image with colour vs. black and white.

The image is broken into tiny pixels.

By doing so, we can look at individual pixels and all others surrounding them.

An image that has been broken down into many pixels.

Arrows are drawn in the direction of dark pixels.

The arrows, also known as gradients, show the light flow across the entire image. The reason for gradients is to ensure that there is only a one-pixel interpretation for each image.

Think about it this way: A dark version of an image and a light version of an image will both have completely different pixel values. So, by incorporating gradients, we can solve that problem!

Following the dark pixels to create a gradient.

However, everything has its downsides. Taking the gradient of each pixel will be very detailed, which will cause our model to miss the big picture.

So, to help in recognizing the image, as a whole, we can break the image up into small squares of 16x16 pixels. Then, instead of taking the gradient of just one pixel, we can take the gradient of the entire square. Whichever direction most arrows are pointing, that will become the gradient of the entire square.

In the end, we will get a basic pixel interpretation that is very similar to the original image. This type of image is also known as a HOG image.

Our model can detect the faces in our image!

If the faces in the image are similar to the known HOG pattern of a face, our model will be able to detect the faces in the image!

2. Using Facial Landmarks to Allow the Model to Recognize a Face from Any Angle:

When we turn our faces in another direction (turning our heads to the right or left), it looks like a completely new image for the model and it may not always recognize that it’s a human face.

This is where the algorithms, face landmark estimation or faceLoc, come into play. The idea is to warp the image, such that all-important and distinct features face the front.

To warp the image, the model needs to map out 68 points on each person’s face. These points include the eyes, nose, the shape of the eyebrows, you get the point.

By using facial landmarks, no matter which angle a person’s face is turned, the model can tell that it’s a person and a face.

3. Encoding Faces:

This is the part where our model is able to tell person x from person y.

Every person has different facial measurements. For instance, the distance between your eyes or the length of your nose or the size of your lips. While it’s likely that someone may have the same nose length as you, all your measurements probably won’t be the same.

This is why the model is going to generate 128 measurements for every face it recognizes.

The training process uses a ‘triplet’ approach:

Take a facial image of person x
Take another facial image of person x
Take a facial image of person y

The algorithm closely analyzes the three pictures and slightly tweaks the measurements, to ensure that person x’s measurements are different from person y’s.

These are the measurements for the three pictures I used. Notice how both of Elon Musk’s pictures are very similar in value compared to Bill Gate’s measurements.

Now, it can be easy to get confused by the two different mesurements. Remember, the model maps out 68 basic landmark points of a person’s face, kind of like an outline, in case the face is ever turned on an angle and requires to be warped to the front for the model to recognize.

The 128 measurements are much more detailed and include measurements of every tiny detail on your face.

4. Attaching a Name to the Encoding

This step is easy because all we have to do is save our images labelled as the name of that person.

I have labelled each image as the name of that person. Whatever you name the image files as, that’s the name that will appear when running the program.

After connecting my webcam to the program, here’s what it looked like!

The output given when shown a picture of Bill Gates.

The output given when shown a picture of Elon Musk.

To add more people, you simply add a labelled picture of them into the images folder and let the program do its magic!

Thanks for reading! If you have any questions or want to chat further, contact me at:

mansikatarey@gmail.com or contact me through LinkedIn. If you would like to subscribe to my newsletter, you can check it out over here!