Node.js + face-recognition.js : Simple and Robust Face Recognition using Deep Learning

The npm package for face recognition

Trained to recognize faces of Raj, Sheldon, Lennard and Howard

In this article I am going to show you how to perform robust face detection and face recognition using face-recognition.js. I was looking for a promising Node.js library that produces accurate face recognition results, found none, thus decided to create one!

This npm package uses dlib under the hood and exposes Node.js bindings to face recognition tools from dlib, as I found them to produce highly accurate results. The dlib library uses deep learning methods and comes with some pretrained models, which have been shown to provide an astonishing prediction accuracy of 99.38% running the LFW face recognition benchmark.

Why?

Lately I have been trying to build a face recognition app with Node.js to extract and recognize the faces of characters from The Big Bang Theory
Initially, I wanted to build this with OpenCV’s face recognizers, similarly to how I did it in my tutorial Node.js + OpenCV for Face Recognition.

However, while these face recognizers deliver fast prediction results, I found them to not be robust enough. More precisely, while they seem to work well with frontal face images, as soon as the face pose is slightly different, they produce more insecure prediction results.

Thus I was looking for alternatives, came across the dlib C++ library, fiddled around with the Python API, was impressed by the results and finally decided: I want to use this with Node.js! Thus I created this npm package providing a simplified Node.js API for face recognition.

So, what is face-recognition.js?

With face-recogntion.js I wanted to provide an npm package which

  • exposes a simple API to get started quickly
  • still allows for more fine grain control if desired
  • is easy to set up (optimally as simple as typing npm install)

While this package is still work in progress, right now you can do the following stuff with it:

Face Detection

You can either use a deep neural net for face detection or a simple frontal face recognizer to do fast and less robust detection:

Face Recognition

The face recognizer is a deep neural net, which uses the model I mentioned to compute a unique face descriptor. This face recognizer can be trained with labeled face images and can afterwards predict the label of an input face image:

Face Landmarks

You can also use this package to detect 5 and 68 point face landmarks:

Cool story, let’s finally see it in action!

Okay, as I said I initially failed to solve this task with OpenCV. Now I have a bunch of 150x150 sized faces of Sheldon, Raj, Lennard, Howard and Stuart sitting here. I will now show you how simple it is to use the data to train a Face Recognizer and to recognize new faces. The code of this example can be found on the repo.

Preparing the data

I have collected roughly 20 faces per character in different poses:

We will use 10 faces each to train the recognizer and the rest to evaluate the accuracy of our recognizer:

The file name of each face image contains the persons name so we can simply map our class names:

['sheldon', 'lennard', 'raj', 'howard', 'stuart'] 

to an array of images per class. You can read an image given the file path with fr.loadImage(fp).

Detecting the faces

As I said the faces are already extracted with a size of 150x150 each, which I have done with opencv4nodejs beforehand. But you can also detect and extract the faces, save and label them as follows:

Training the Recognizer

Now that we have our data in place we can train the recognizer:

Basically what this does is feeding each face image into the neural net, which outputs a descriptor for the face and store all the descriptors for the given class. You can also jitter the training data by specifying numJitters as a third argument, which will apply transformations such as rotation, scaling and mirroring to create different versions of each input face. Increasing the number of jittered version may increase prediction accuracy but also increases training time.

Furthermore, we can store the recognizers state, so that we do not have to train it again the next time and we can simply load it from a file:

Save:

Load:

Recognizing new faces

Now we can check the prediction accuracy for our remaining data and log the results:

Currently prediction is done by computing the euclidean distance of the input face’s descriptor vector to each descriptor of a class and a mean value of all distances is computed. One could probably argue that kmeans clustering or an SVM classifier would be better suited for this task and I might implement these in future as well. But for now using euclidean distance seems to be fast and efficient enough.

Calling predictBest will output the result with the lowest distance e.g. the highest similarity. The output will look somehow like this:

{ className: 'sheldon', distance: 0.5 }

In case you want to obtain the distances of the face descriptors of all classes to an input face you can simply use recognizer.predict(image), which will output an array with the distance for each class:

[
{ className: 'sheldon', distance: 0.5 },
{ className: 'raj', distance: 0.8 },
{ className: 'howard', distance: 0.7 },
{ className: 'lennard', distance: 0.69 },
{ className: 'stuart', distance: 0.75 }
]

Results

Running the above example will give the following results.

Using 10 faces each for training:

sheldon ( 90.9% ) : 10 of 11 faces have been recognized correctly
lennard ( 100% ) : 12 of 12 faces have been recognized correctly
raj ( 100% ) : 12 of 12 faces have been recognized correctly
howard ( 100% ) : 12 of 12 faces have been recognized correctly
stuart ( 100% ) : 3 of 3 faces have been recognized correctly

Using only 5 faces each for training:

sheldon ( 100% ) : 16 of 16 faces have been recognized correctly
lennard ( 88.23% ) : 15 of 17 faces have been recognized correctly
raj ( 100% ) : 17 of 17 faces have been recognized correctly
howard ( 100% ) : 17 of 17 faces have been recognized correctly
stuart ( 87.5% ) : 7 of 8 faces have been recognized correctly

And here is what happens when we run face recognition on a video stream:

Conclusion

Looking at the results, we can see that even with using a small set of training data, we can already obtain pretty accurate results. Even though some of the extracted faces are very blurry because of the small size of the images I scraped from the web.

If you liked this article you are invited to give this npm package a try. Also I would highly appreciate leaving a star on my github repository as well as any kind of feedback. :)