Realtime Face Recognition in the Browser

Facer

Not that long time ago Vincent Mühler has posted several posts about his amazing face recognition library face-api.js based on TensorFlowJS. For me, having worked previously with OpenCV on the server side, the idea of doing Face Recognition in the browser felt quite compleling to try out. As result of that curiosity, Face® was born.

Face® is not an AI project, but merely an engineering one, trying to utilize face-api.js for performing Face Recognition in the browser and engineer a software solution that will take care of application aspects like:

  • User Registration
  • Models loading
  • Image upload, resizing and deleting
  • Using camera (taking and resizing photos, face recognition)
  • Training (Face landmarks/descriptors extraction) and storing models
  • DevOps — Deploying using Docker

using the following technology stack:

If you want to know more about the face-api.js library, I recommend start with Vincent’s post.

Here we will focus on building an app that utilizes that library.

So let’s dive into it.

Models

We will be using the following models:

The total size of all models is less than 6.5 MB.

App Workflow

For the sake of simplicity this application will not use any DB as a storage per say, and instead the registration process for each user will rely on:

  • creating a new sub-folder in the folder /data/users/ e.g. for the user Goran the app will create /data/users/Goran, in which will be stored all his photos needed for training the face recognition model.
  • and trained face recognition model for all users and all their photos will be stored in a static file /data/faces.json (see format below).

First, the user will need to register by typing in his/her name, after which he/she needs to upload at least 3 photos of him/her-self. The photo upload can be done either by a file upload or by taking photos using the browser camera (WebRTC getUserMedia API). During the upload, all photos are resized to size 320x247 using the performant image processing library sharp.

After the registration, the user can start the training process, which takes all users in the catalog (and their uploaded photos) and extracts:

  • rectangles with their faces (optional)
  • 68 face landmarks (optional)
  • 128 face descriptors (these are only required for training the model)

and stores the the face recognition model in the static file /data/faces.json in the following format:

Architecture

The application is using NUXT.JS with SSR (server-side rendering) and it following the its default directory structure convention.

The application in development mode will split the SERVER in two distincts processes:

  • /server/index.js — for static content (frontend) — listening on port 3000
  • /api/index.js — for API calls (backend) — listening on port 3001
npm run dev
npm run api

By separating the frontend from the backend during the development, we are reducing the number of Nuxt-loaded files, and with it the size and duration of that initial startup, as well as we benefit from faster consecutive start/stop of the API needed during debugging.

On the other hand, in production mode, the application the merges the server-side into a single process, listening on port 3000.

npm run build
npm run start

Loading the models

When the user onboards to our app, we are going to load all TensorflowJS models and for that purpose we will utilize the mounted() handler of the /layouts/default.vue:

With load() action being handled by /store/face.js:

User registration

We are registring the user via the simple form in the /pages/users/index.vue page handled by the register() method:

With register() action being handled by the /store/user.js:

And API call being handled by /api/controller/user-controller.js:

which creates a new folder: /data/users/_name.

Photos upload

The user has two options/tabs to upload a photos:

  • tab-1: Either by a file upload HTML input, handled by filesChange() method (resized to 320x247 on the server-side)
  • tab-2: Or by using the browser camera (WebRTC getUserMedia API) and taking photo snapshots via the HTML5 canvas, handled by takePhoto() method (sized to 320x247 on the client-side)

After registration, we’ve navigated the user on the page where he/she can continue uploading photos /pages/users/_name.vue.

With upload() and uploadBase64() actions being handled by the /store/user.js:

And the API calls being handled by /api/controller/user-controller.js:

For extracing the file content from the mutipart/form-data we are using Multer. Also, since the user can upload images of different sizes and shapes, we are resizing the uploaded image using sharp. Finally, the photos get stored in the user’s folder: /data/users/_name/.

Training — Face Descriptor Exraction

Training is the process of extracting 128 face descriptors out of one image for a given user (vector of 128 descriptor values).

It is recommended that one user has at least 3 photos uploaded for training. Hence, after the training, the face recognition model will be composed out of n x m descriptor vectors; m — being number of users and n — being the number of photos for given user.

We can perform training in one of the following manners:

  • One by one — training after each photo upload of a selected user
  • Per-User — training for all user’s photos of a selected user
  • Batch — training for all users and all their photos at once

In this app, we will implement a Batch training process, that as an end result will store the face recogintion model inside the /data/faces.json file.

The batch training process is started from the /pages/train.vue:

that iterates through the list of users, and for each of their photos, it extracts the face descriptors (128 face descriptor vector). After that, it save the JSON face recognition model via the save() action of the /store/face.js:

And the API call being handled by /api/controllers/face-controller.js:

Recognition

Face recognition process reads the face recognition model (faces.json) and creates a Face Matcher, that is able to calculate the Euclidean Distance between the face descriptors vectors of the stored face recogition model and any new face to be recognized.

In the UI, via the camera (WebRTC getUserMedia API), we start to sample with 60 fps (frames per second) snapshot stored inside a HTML canvas element. Then, for each such snapshot, we extract its face descriptors and using the face matcher we output the best match and draw it back onto the canvas.

The face recognition process is started via the /pages/recognize.vue:

whereat getFaceDetections(), drawLandmarks(), recognize() and drawDetections() actions are handled by /store/face.js:

Production

As NodeJS by default works in a single process, in production this approach is not optimal, especially if we have hardware with mutiple CPUs and/or CPU Cores.

Hence we will utilize PM2 to instantiate as many processes as the number of CPU Cores we have (-i 0 param), and load balance the request between those forked processes:

pm2 start server/index.js -i 0 — attach

Deploy

We will package our app as a very lean Docker Image, based on Alpine:Edge (less than 300 MB because size matters):

Then we can either:

  • build the Docker image by docker build -t gjovanov/facer .
  • or use Travis Reeder’s versioning’s script ./build.sh

Or pull the one from Docker Hub docker pull gjovanov/facer .

Finally, we can run the Docker Container by:

docker run -d --name facer \
--hostname facer \
--restart always \
-e API_URL=https://facer.xplorify.net \
-p 8081:3000 \
-v /gjovanov/facer/data:/facer/data \
--net=bridge \
gjovanov/facer

Source code & Demo

Both, the source code and a demo are available to try out.

Any suggestions for improvements or pull-requests are more than welcome.