Facial Recognition — the easy way

Maciej Kankowski
Jun 4 · 6 min read

One of the key buzzwords in today’s IT world is machine learning. Everyone claims either to be doing it, trying to do it, pretending that they will do it or simply reading about it. The potential gains from using one of many machine learning approaches are obvious. The promises made by key technical vendors are very… promising, and many success stories of various machine learning start-ups or big tech firms are encouraging. In this article I’ve written down my experiences on taking the first steps in implementing an actual machine learning use case in our internal mobile app. Keep in mind that I’m not a trained machine learning expert, but rather a developer who likes to play with code.


Aside from being a useful tool in our daily work in a growing tech company, WeJit is also a playground for testing new technologies and evaluating new use cases. The one that I’m interested in at this stage is the following:

As a WeJit user, I’d like to find a profile by taking a photo of a person.

The story is simple: I go to the lunch area, see someone I don’t know and take a photo of him/her — and the app shows me his/her public WeJit profile. Maybe it is not the smartest way to make new friends — but definitely a reasonable coding challenge to be taken.

What is Facial Recognition?

Machine Learning is not magic!

Initially, the problem didn’t appear to be trivial at all. If we take a systematic approach, we can define the following key points that need to be taken care of:

  • defining the dataset / training set — which in our case is based on WeJit profiles; note that this set will be growing over time as new people are joining the company — so live updating needs to be supported,
  • detecting the face in a photo — locating the actual rectangle of the photo, where the face is — this is the first algorithmic part of the task,
  • identifying the face — matching the detected face with the dataset — this is the key algorithmic part for which magical ML stuff should be used.

A few years ago, starting this project would have been much more difficult. But today, as machine learning grew to being one of the hottest things in IT, things have changed. A lot.

Technology stack

Basing on certain generally accessible sources (Quora, Kairos and RapidAPI), I’ve chosen to follow the offering of Microsoft available in their Azure cloud, with plans for in-house, own neural network implementation left for the future.

Azure Cognitive Services

  • Detection — providing face location and determining its attributes as age, emotion, gender, hair color or accessories on it,
  • Identification — searching, identifying and matching face image from defined repository,
  • Verification — comparing two faces and checking if they belong to the same person,
  • Grouping — organizing faces together into groups;
  • Find Similar — finding similar faces in a dataset.

So it looks like it has all we need! Furthermore, Microsoft delivers very satisfying multiplatform documentation with examples and support for Web, iOS and Android development.

There’s no rose without a thorn, they say. And indeed, we encountered some limitations. One of them is quota for traffic: 20 transactions per minute and up to 30,000 transactions per month. These conditions apply to the first year of usage. Luckily, for our case it is more than needed!

First approach — playing with existing example

I began with the following configuration:

Training dataset #1 (actual faces used with permission).

Training dataset: Group of 3 people: 2 people with 2 photos, 1 person with 1 photo.

Results: People with 2 photos are recognized correctly, but person with 1 photo is not!

Results of example #1. Person with only 1 training example is not recognized.

To improve the results and to get the expected answer, I corrected the input source by adding additional image of Jakub.

Training dataset #2 — added the second photo of Jakub.

Training dataset: Group of 3 people — each with 2 photos.

Results: All people are recognized correctly! 🚀

Results #2 — all three people are recognized correctly.

Second step — integration with WeJit

To build the dataset required by the algorithms, I’ve used an open source tool: howlowck/train-faces, which offers a nice, web-based UI for interacting with the Face API.

The only thing needed now is: to take a photo, make few asynchronous calls to the Face API and get answers. More technically speaking, such steps should be followed:

Seems easy! Let’s take a look at the result.

Final result

I did some testing, including face pictures with eyeglasses, with half-turned head or inside a dark room. And in most cases the solutions worked well! The most surprising test result was the one for a beardy workmate, with training dataset containing just bearded face pictures — after having shaved, he was still recognized correctly (Witold Bołt — thanks for such a dedication).


The initial concept based on the cloud proved to be successful. Our next step is to dig deeper into details and work on a custom-made solution without proprietary cloud-based APIs. We stand up for popular libraries as OpenCV (computer visioning) and Tensorflow (neural networks). Stay tuned for more details!

Jit Team

Clever Thoughts by Jit Team

Maciej Kankowski

Written by

Just another mobile and frontend developer…https://mackan.pl/

Jit Team

Jit Team

Clever Thoughts by Jit Team