Teaching my computer to recognize faces.

Published in

Geek Culture

10 min readMay 27, 2021

Face Recognition is a technology which is all around the world 🌎 right now! I am sure all of you have used it many times, whether it be when unlocking your phone📱, or when checking the identity of a person(the police uses this in some cases). I know face detection is cool and all, though what exactly is it? How does it work?

What is Face Recognition and how does it work?

Face Recognition is a way of recognizing specific individuals’ identity through a set of different steps. These steps play a big role into identifying a face, which I will list down below:

Step 1: Face Detection - The first step to facial recognition is to detect and locate a face. This is a really important step, because if the computer/machine doesn’t even know what a face is, then how will it detect the exact coordinates 📈 and position of a face. This is really all to step 1(I will be discussing this a lot later).

Step 2: Capture Face Print - The second step is to begin training the computer/machine to recognize a specific individual. More specifically, this step is to understand the facial features of a specific person. Examples of this are, looking at the shape of a person’s face, finding the distance between a person’s eyes, and so on… These features are unique to each individual, and are what separates you from others. You can address these unique facial features, as face print. This is what step 2 is mainly about.

Step 3: Finding a match - Step 3 is the last step, this is the step where your face print is compared to millions of other people’s photos. If your face print matches the one of the person in the photo, then the computer determines its result.

Note: All of these 3 steps are important for face recognition, but this article will be mainly focusing on step 1. This is because my project was based mainly on the 1st step, which was Face Detection.

Face Detection

As I said above, my project was mainly on detecting faces. Now as I already talked about Face Detection above, I think its time to show the code. Just before this, I wanted to let all of you know that for this project I used a quite interesting object detection algorithm. A lot of the later portion of my article will be about how that alogrithm works. So now lets dive into the code!

I will try to explain each step of the code, to make it easy for you all to understand!

Full Code

import cv2face_detection = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')cap = cv2.VideoCapture(0)
cap.set(3,640)
cap.set(4,480)while True:
    success, img = cap.read()
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)    faces = face_detection.detectMultiScale(img_gray, 1.3, 6)    for (x,y,w,h) in faces:
       rect = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),4)       cv2.putText(rect, 'Face', (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36,255,12), 2)    cv2.imshow("Video", img)    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

Importing Libraries

import cv2

The first step to any coding project is to import all the necessary libraries. If you don’t know, libraries in coding are basically a set of resources/functions which allow you to use special functions when you import them in your project. The library which I used for my project was open cv, which is a computer vision library used when dealing with live videos and images.

Cascade Classifier

face_detection = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'

In this part of my code, I am using something known as the cascade classifier. This is a really important part in my code(I will be talking a lot about this soon), since it plays a big role in the actual face detection. It is a classifier which takes the “haarcascade frontal face” file as input, and helps the computer identify the difference between a face and object.

Capturing Video

cap = cv2.VideoCapture(0)
cap.set(3,640)
cap.set(4,480)

Here I am capturing my live video using open cv. I inputted a 0, to let my computer know that I want to capture my video from my laptop’s camera. The other two “cap.set” lines are to define the dimenions of the window which will open, when I later run the code.

While Loop

while True:
    success, img = cap.read()
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)    faces = face_detection.detectMultiScale(img_gray, 1.3, 6)    for (x,y,w,h) in faces:
       rect = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),4)       cv2.putText(rect, 'Face', (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36,255,12), 2)    cv2.imshow("Video", img)    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

This is a really important part of my code, and I will be explaining this part altogether. First of all, I chose a while loop here becuase I wanted to display the live video. A video can’t be displayed by itself in open cv, so I am splitting the live video feed into frames of images. These images will keep on repeating due to the while loop, and cause the video to be able to be displayed later. So now hopefully you have understood the purpose of the while loop, so I will begin explaining the code inside the loop.

success, img = cap.read()

Beginning with “success, img = cap.read()”, this part of the code is telling the computer to understand the video capture(meaning to read it), and reference it as img. Also, the success here is a boolean(True or False), letting you know if the computer succesfully understood the video capture.

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

The next line has a variable named img_gray which is converting each frame of the live video to grayscale. The reason to this is to make it easy for the computer to detect faces looking at my live video. If a image has colour though, it will be a lot more complex for the computer to figure things out. So converting the live video frames/Images to grayscale is a good idea.

faces = face_detection.detectMultiScale(img_gray, 1.3, 6)

The line which I am going to talk about now is really important. This line has two meanings. It lets you know the coordinates to the face in the live video, and enables you to detect faces of different sizes(multiscale). Just if you don’t know what I am talking about, here is the code: “faces = face_detection.detectMultiScale(img_gray, 1.3, 6)”. The parameters in this function are really important. I think most of you know, that the img_gray is just to let the computer know to apply all this to the gray scale image, so I will move on and explain what the 1.3 and 6 are doing here.

1.3 here is known as the scale factor, which lets the computer know, how much it should resize each frame in the live video, so that the computer is more efficent and accurate in its detections.

The 6 here is the minNeighbour, all you need to know about this is that it plays a role in reducing false face detections. So you can say it ends most of the random face detections, and leaves the ones which are accurate.

for (x,y,w,h) in faces:
       rect = cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0),4)       cv2.putText(rect, 'Face', (x, y-10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36,255,12), 2)

Now I will start talking about the easier and more fun parts! So here the code is really simple, we have a for loop which is getting the coordinates of the face from the faces variable. Then we are just putting a bounding box around the face using the command cv2.rectangle. The parameters are also pretty simple here, one is defining the thickness, and the other is the colour. The others are just specificing the coordinates, and to draw the bounding box around the face.

After drawing the bounding box, I wanted to put a label on it, saying face. That is what next line does, it puts a label on top of my bounding box.

cv2.imshow("Video", img)

Well, now the only thing which would be left to do is to actualy show all this happening, and this is what the imshow function does! It displays the live video, while detecting the face, and placing the bounding box.

if cv2.waitKey(1) & 0xFF == ord('q'):
        break

Finally, we have come to the last part. So the last part is an if statement, telling the computer to close the live video’s window when the key q is pressed!

Haar Cascade Classifiers

Wow, you are still here? Well, Congratulations 🎉! Now I am on the last part of this article, and I am telling you that this will be really interesting! Remember above I talked about how I will be talking about the cascade classifier, well now the time has come. So sit back, and I will go through it!

Well, first lets begin talking about haar cascade. So if you don’t know haar cascade is an object detection algorithm, which is well known for face detection. It is one of those orignal face detection techniques, which was introduced to us by Paul Viola and Michael Jones.

The haar cascade algorithm requires a set of positive images, and a set of negative images to the train the cascade classifier. Training the cascade classifier with the haar algorithm invovles a whole process.

This process needs 3 main things to work:

Haar Features
Integral Images
AdaBoost
Stages

When this process is done, you finally need to implement the cascade classifier. I will go back to this after I explain the process more thoroughly.

Note: I just wanted to say, that in this explanation I won’t be including the math. This is because I want the article to be more of a introduction, and also because I wasn't able to fully understand all the math.

Haar Features

A really important part of the haar cascade algorithm are haar features. These are black and white adjacent rectangles, which play role in detecting edges. These features are similar to a kernel in a Convolutional Neural Network(Ignore this if you don’t know what I am talking about), they move around the image, left to right. When the haar feature spots a part of the image that is dark on one side, and light on the other, it can tell that that it is a edge.

Example of haar features being implemented

As you can see in the image above, this is an example of how haar features are implemented on a face. Hopefully, now you have a better understanding of haar features, because I will be moving on to talking about integral images.

Integral Images

A integral image is a concept which allows you to add values, and calculate haar features in a more easy and efficient way. I didn't talk about this earlier, but to detect edges, haar features have to add many values. Adding all those values for each haar feature, take a lot of time. That is why integral images are a big part of the process to train a cascade classifier.

AdaBoost

Now I will be talking about AdaBoost, which is the 3d and one of the final steps to train the cascade classifier. If you didn't know, there more than 160,000 + haar features, and all of them aren’t the most efficient. This is why we use AdaBoost, which is a technique to get the most efficient haar features, so we can make the detection process more quick and efficient.

In this technique, each haar feature is implemented on images, and the features which produce the lowest error rate, are chosen to be used when detecting faces.

Well yeah! That is Adaboost, so now I will move on to stages.

Stages

To detect a face, as I said we have to use haar features, and even after we use AdaBoost, there are about 6000 features left. To make the process even more simple and efficient, we use something know as stages.

We have stages, where each stage has a set of features implemented to the window(portion of the image). As the stages pass, more complicated haar features are implemented. This means that each stage uses a more complex feature. Wait, what? How does this make the process more efficient?

This make the process more efficient, because if the window doesn’t detect anything in a stage, then it just moves on to the next, so it spends more time on using the features on the actual face, so the face detection is faster. Altough, if the haar feature detects edges in that window, then the stage is declared “passed”, and you can move on to the next stage(look at the gif above for a better understanding).

Implementation of Cascade Classifier

Well, now all there is to actually implement the Cascade Classifier! The models trained are stored into a xml file format, which can be used to actually classify faces in the live video.

Training your own haar cascade

A haar cascade file is not available for everything. So if you want to use a cascade classifier for your project, you can make your own haar cascade file! I actually was going to do this originally, but due to some issues I wasn't able to implement this.

I am not going to go too deep into this, but if interested then check out:

OpenCV: Cascade Classifier Training

Prev Tutorial: Cascade Classifier Working with a boosted cascade of weak classifiers includes two major stages: the…

docs.opencv.org

Cascade Trainer GUI - Amin

Cascade Trainer GUI 1. Introduction Cascade Trainer GUI is a program that can be used to train, test and improve…

amin-ahmadi.com

Socials + References + Conclusion 👋

Well, that was all to this article! Hope you enjoyed, and learnt something new! If you liked the article, then consider leaving a clap and comment any feedback you may have :)

References

Here, I just want to list some article I used as a reference.

Haar Cascades, Explained

A brief introduction into Haar cascades, their applications, and how they can be implemented in code.

medium.com

Face Detection with Haar Cascade

Exploring a bit older algorithm which proves to be challenging even in the Deep Learning times

towardsdatascience.com

What is Facial Recognition - Definition and Explanation

Facial recognition is a way of identifying or confirming an individual's identity using their face. Facial recognition…

www.kaspersky.com

Socials

Yajat Mittal - Innovator at The Knowledge Society - The Knowledge Society (TKS) | LinkedIn

Hi there! I am Yajat, a 12 Year old innovator at the knowledge society. Currently, I am exploring many different…

www.linkedin.com

Email: mittalyajat@gmail.com