Facial Recognition — the easy way

Published in

Jit Team

6 min readJun 4, 2019

One of the key buzzwords in today’s IT world is machine learning. Everyone claims either to be doing it, trying to do it, pretending that they will do it or simply reading about it. The potential gains from using one of many machine learning approaches are obvious. The promises made by key technical vendors are very… promising, and many success stories of various machine learning start-ups or big tech firms are encouraging. In this article I’ve written down my experiences on taking the first steps in implementing an actual machine learning use case in our internal mobile app. Keep in mind that I’m not a trained machine learning expert, but rather a developer who likes to play with code.

Motivation

Our team grows continuously recruiting new members. Basically, it has become harder to know who is who at the firm. This is why we created WeJit — our internal “Facebook”, developed by our internal dev team, available as a web and mobile app. Its main feature is a searchable database of employee profiles. You can think of it as a fancy CV database. Each profile has a photo assigned to it, and the HR department makes sure these photos are the actual photos of the people we hire (not their pets, cars or their favorite Avengers).

Aside from being a useful tool in our daily work in a growing tech company, WeJit is also a playground for testing new technologies and evaluating new use cases. The one that I’m interested in at this stage is the following:

As a WeJit user, I’d like to find a profile by taking a photo of a person.

The story is simple: I go to the lunch area, see someone I don’t know and take a photo of him/her — and the app shows me his/her public WeJit profile. Maybe it is not the smartest way to make new friends — but definitely a reasonable coding challenge to be taken.

What is Facial Recognition?

Initially, the problem didn’t appear to be trivial at all. If we take a systematic approach, we can define the following key points that need to be taken care of:

defining the dataset / training set — which in our case is based on WeJit profiles; note that this set will be growing over time as new people are joining the company — so live updating needs to be supported,
detecting the face in a photo — locating the actual rectangle of the photo, where the face is — this is the first algorithmic part of the task,
identifying the face — matching the detected face with the dataset — this is the key algorithmic part for which magical ML stuff should be used.

A few years ago, starting this project would have been much more difficult. But today, as machine learning grew to being one of the hottest things in IT, things have changed. A lot.

Technology stack

I looked for the most appropriate solution that I could easily integrate with the existing mobile app (which is a cross-platform React Native app). Moreover, it would be nice to have the same solution applicable to the web part in the future as well. This led to a conclusion that a cloud-based API solution would be the best first choice.

Basing on certain generally accessible sources (Quora, Kairos and RapidAPI), I’ve chosen to follow the offering of Microsoft available in their Azure cloud, with plans for in-house, own neural network implementation left for the future.

Azure Cognitive Services

Cognitive Services, a part of Microsoft Azure cloud platform, is a set of services for adding machine learning algorithms to existing (or new) products. In our case, I focused on a set of RESTful services contained in Face API, which provides following functionalities:

Detection — providing face location and determining its attributes as age, emotion, gender, hair color or accessories on it,
Identification — searching, identifying and matching face image from defined repository,
Verification — comparing two faces and checking if they belong to the same person,
Grouping — organizing faces together into groups;
Find Similar — finding similar faces in a dataset.

So it looks like it has all we need! Furthermore, Microsoft delivers very satisfying multiplatform documentation with examples and support for Web, iOS and Android development.

There’s no rose without a thorn, they say. And indeed, we encountered some limitations. One of them is quota for traffic: 20 transactions per minute and up to 30,000 transactions per month. These conditions apply to the first year of usage. Luckily, for our case it is more than needed!

First approach — playing with existing example

The first step of our proof-of-concept project was to test the demo app provided by Microsoft: Cognitive-Face-iOS.

I began with the following configuration:

Training dataset #1 (actual faces used with permission).

Training dataset: Group of 3 people: 2 people with 2 photos, 1 person with 1 photo.

Results: People with 2 photos are recognized correctly, but person with 1 photo is not!

Results of example #1. Person with only 1 training example is not recognized.

To improve the results and to get the expected answer, I corrected the input source by adding additional image of Jakub.

Training dataset #2 — added the second photo of Jakub.

Training dataset: Group of 3 people — each with 2 photos.

Results: All people are recognized correctly! 🚀

Results #2 — all three people are recognized correctly.

Second step — integration with WeJit

At this stage, after investigating the source code of the example and playing around for a while, I was ready to work on implementing the functionality into WeJit app.

To build the dataset required by the algorithms, I’ve used an open source tool: howlowck/train-faces, which offers a nice, web-based UI for interacting with the Face API.

The only thing needed now is: to take a photo, make few asynchronous calls to the Face API and get answers. More technically speaking, such steps should be followed:

using react-native-image-crop-picker, we take a photo,
we fetch the binary form of the photo using rn-fetch-blob to send it over to the cloud,
using Face — Detect, we find the face and further, using Face — Identify, we detect the person, who is finally fetched by PersonGroup Person — Get.

Seems easy! Let’s take a look at the result.

Final result

The short movie below shows the app in action:

I did some testing, including face pictures with eyeglasses, with half-turned head or inside a dark room. And in most cases the solutions worked well! The most surprising test result was the one for a beardy workmate, with training dataset containing just bearded face pictures — after having shaved, he was still recognized correctly (Witold Bołt — thanks for such a dedication).

Summary

The research brought us a clear answer that machine learning doesn’t have to be magical at all. Machine learning can be easy and there is nothing to be afraid of.

The initial concept based on the cloud proved to be successful. Our next step is to dig deeper into details and work on a custom-made solution without proprietary cloud-based APIs. We stand up for popular libraries as OpenCV (computer visioning) and Tensorflow (neural networks). Stay tuned for more details!