A Face Detection Guide Using Google's Coral Development Board

Published in

The Startup

6 min readFeb 19, 2020

Source: https://cloud.google.com/vision/docs/detecting-faces

This post is the second one on a series about my experience as an A Level student on implementing a functional face recognition system on an embedded device, more specifically the Google Coral development board. You can find the previous post here: A FaceNet Approach To Facial Recognition

In this blog post, I’ll be attempting to expose the different methods of face detection focusing mainly on face recognition as it is my final goal.

For this purpose, I will be making use of the OpenCV (Open Source Computer Vision Library) which is a machine learning software library and easy to import into Python. It was quite a challenge to set up OpenCV on the development board but I can confirm it is possible.

Detection

Face detection only went mainstream in the early 2000s when Viola and Jones invented a way of detecting faces on low-power devices. Since then, a few different methods have been discovered. For my project specifically, I decided to focus on appearance-based methods.

What are appearance-based methods? These are solutions that identify places (in this case faces) on the basis of sensory similarity. Meaning that it is trained on raw images, not feature vectors.

Haar Cascades Classifier

The first machine learning-based cascading classifier. It can run on low-power devices such as mobile phones and cameras. It is a machine learning based approach where a cascade function is trained from lots of images, both positive and negative. Based on that training, it can then be used to detect objects (or faces) in other images.

The approach

Haar Cascade

The detection algorithm was proposed by Paul Viola and Michael Jones in their famous paper Rapid Object Detection using a Boosted Cascade of Simple Features back in 2001. It makes use of a cascade function, which is trained on a lot of positive and negative images (where positives are the samples where the face is present, negatives when it isn’t). OpenCV offers pre-trained algorithms organised in categories (eyes, faces, etc) thus saving yourself from having to train your own classifier from scratch.

How it works

The process is quite similar to a convolution kernel. The idea is that the Haar cascade extracts features from images using some sort of filter. These filters are called Haar features and look similar to this:

As you can see, A and B are used to detect edge features while C detects line features and D four-rectangle features. The values indicate certain characteristics of a particular area of the image. Each feature type can indicate the existence (or absence) of certain characteristics in the image, such as edges or changes in texture. For example, a feature can indicate where the border lies between a dark region and a light region.

The idea is to inspect each area of the image at a time. For each area a single value is obtained by subtracting the sum of pixels under the white rectangle from the sum of pixels under the black rectangle. Ideally, a feature with great value demonstrates its relevancy. Below you can see a very simple example of how filters can be applied:

While it might appear simple, when scaling the problem, for example by using a 24x24 window, the results are of over 160,000 features. It was quite a challenge in the early days to make the process efficient with this amount of features. The solution is a concept known as Integral Image which I’ll explain more about later on.

After that, it is important to make the algorithm efficient and optimised. How can we decide which features are relevant out of possibly over 160,000? The answer relies on Adaboost, which selects the best features and trains the classifier to use them. Below you can see how Adaboost works: feature detecting a vertical edge is useful to identify a nose but irrelevant when detecting lips.

Source: Haar Cascade Facial Identification

Lastly, we need to further optimise to reduce the time of training as much as possible. Even though the number of features got reduced to a more manageable number, applying all features on all the windows would take a very long time. This is where the concept of cascading plays a part: instead of applying all the features on a window, it groups the features into different stages of classifiers and applies them one-by-one. If one fails, the algorithm discards it. If it passes, it then applies the second stage of features and continues the process.

Adaboost

Adaboost is a popular boosting technique that helps you combine multiple “weak classifiers” into a single “strong classifier”. For example, it might be classifying a person as male or female based on their height. You can assume that anyone over 5'9" is a male and anyone under that a female. This way, you would misclassify a lot of people but the accuracy will still be greater than 50%. There are two main things that Adaboost does:

Helps to choose the training set for each new classifier that you train based on the results of a previous classifier.
It determines how much weight should be given to each classifier’s proposed answer when combining the results.

A more abstract example of Adaboost is to compare it to human specialisation. Get person A (a weak learner) to learn problem X. Whatever part of X that A is not good at, get person B to learn that subset. Whatever A and B are not good at, get C to learn it. And so on. Each learner specialises in the weakest area that needs the most improvement.

Integral Image

Integral image, also known as a summed area table, is an algorithm for quick and efficient computation of the sum of values in a rectangle subset of a pixel grid. Images can be represented with a matrix of pixels with values ranging from 0 to 255. Based on that, let’s consider the image below as the input:

Source: Detecting Faces (Viola Jones Algorithm) - Computerphile

We are trying to find a face, thus we want to calculate the area of a rectangle minus another rectangular area to find features. As you can see in the example below, you would subtract the sum of pixels under the white rectangle from the sum of pixels under the shaded rectangle.

Considering the source image, if we apply this feature extractor to the top left corner we would get 1 + 7 + 7 - (1 + 3 + 2 + 8) = 3. This is a very simple example, if you’re doing this over large sections of image and thousands and thousands of times this approach is not effective. This is where Integral Image helps. It does one pass over the image, every new pixel is the sum of all the pixels above and to the left of it including it. You can see it intuitively below:

Implementation

Now that the concept is (hopefully) clear, let’s see how it would be implemented in Python using OpenCV. This is the image I’ll be using to detect a face:

import numpy as np
import cv2# Here we are importing the already trained HaarCascade models
facial_cascade_classifier = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
eye_cascade_classifier = cv2.CascadeClassifier("haarcascade_eye.xml")img = cv2.imread("exampleImage.jpg")# Making the image b/w
img_bw = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray, 1.3, 5)for (x, y, w, h) in faces:
    img = cv2.rectangle(img, (x,y), (x+w, y+h), (255,0,0), 2)
    roi_gray = gray[y:y+h, x:x+w]
    roi_color = img[y:y+h, x:x+w]
    eyes = eye_cascade_classifier.detectMultiScale(roi_gray)    for (ex, ey, ew, eh) in eyes:
        cv2.rectangle(roi_color, (ex,ey), (ex+ew, ey+eh), (0, 255, 0), 2)cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Below you can see the result image: