Not The Normal GSoC Journey

6 min readAug 26, 2019

I took an unconventional project and proposed an unconventional solution but yet CCExtractor embraced my different and unproven approach and gave me the opportunity to work on their hottest project idea — viz. Poor Man’s Rekognition.

You can read my GSoC proposal to know more about my approach and the reasons behind taking the Nodejs path for an AI/ML project instead of using popular Python frameworks.

The Project

Poor Man’s Rekognition (PMR) aims to provide a free and open source alternative to Amazon Rekognition. A disruptive technology like face recognition must be available to everyone and not just with the more privileged. Amazon Rekognition is a fairly complex system. To start PMR, I focused on face detection and recognition. There are many libraries especially in Python that provide these functionality but they require a lot of expensive hardware for practical use. Throughout the summer, my aim was to make everything run on CPU but still provide reasonable performance for a REST API. I chose Nodejs and C++ to achieve this goal.

Dividing into 3 Projects

My GSoC project spans 3 sub-projects and hence 3 different repositories-

Nodoface: A Nodejs C++ addon (binding) to C++ libraries that helped attain high performance on CPU for ML algorithms.
PMR-Core: Nodejs library (with TypeScript support) that combines Nodoface and ML libraries that are not available in C++. For eg. there is no ArcFace model on C++ or Nodejs. This provides completely everything that is needed to build Amazon Rekognition like API.
PMR-Server: The REST API for PMR-Core. This provides endpoints similar to Amazon Rekognition. Any API call generates response which is more or less identical to Rekognition. This sub-project is incomplete though.

Nodoface

Status: Complete

More important features in Nodoface are MTCNN face detector and input/output operations for Image and Video files. There are more features implemented but not currently being used in the REST API.

Notable performance gains on CPU:

Table 1: Major performance differences

All metrics are measured on Intel i7 8th Generation Quadcore processor running at 1.9 GHz on a Fedora machine.

MTCNN detector is a binding of it’s actual implementation from a C++ library OpenFace 2.0. Here is it’s result on one of the unseen data:

Image 1: MTCNN detection result on 720x443 image took 56 ms

For a full list of features implemented in Nodoface check the repository.

PMR-Core

Status: Complete*

*Complete in functionality but not using the proposed method.

Main goal of PMR-Core was to provide face recognition and classification. It uses Nodoface to efficiently handle image and video IO and frequently used face detection algorithms. For face recognition, I proposed to implement ArcFace using TFJS but unfortunately, my attempts to train ArcFace failed. I have implemented the ArcFace loss in TensorflowJS but I ran out of time to train a ResNet architecture using ArcFace loss. But I had to move forward and hence I am using a pretrained Facenet model from face-api.js (without monkey-patching Node environment). Avoiding face-api.js’s global API, (it’s pretty good design though) it’s FaceRecognitionNet is used in PMR-Core to compute embeddings so that it can be easily replaced with ArcFace in future.

For classification, a simple KNN-Classifier is used taking embeddings as feature vectors. However, the accuracy is too low on LFW owing to the small samples/class ratio. There are only 483 unique persons that have 10 or more samples. Most identities have only 2 or 1 samples. Here is one of the predictions:

Image 2: Roger Federer mis-classified as Gordon Brown with 25% confidence

I can use larger dataset but with KNN the entire dataset needs to be loaded in memory. LFW despite being a small dataset, only the embeddings from all faces stored as compact JSON file fill up my laptop having 16GB RAM. I must use a different classifier or rely on hierarchical clustering to improve classification accuracy.

PMR-Server

Status: Partially complete

Now comes the part where I can show a lot of outputs :)

PMR-Server uses PMR-Core (Nodoface too, indirectly) to process its request and asynchronously generates output as 1) JSON response and 2) A downloadable image/video with drawn bounding boxes and labels for visualization. For images, input is JSON containing base64 string representation of the image. It is decoded into Image instance (cv::Mat binding in Nodoface). Videos are send by their direct download URL in POST body.

The server follows a “Dispatch and Poll” method. Everytime a request is received, the server dispatches a “Job” and sends response immediately with the Job status. For example:

Image 3: Output of RecognizeCelebrities action before Job completion

Initially, Job Status will default to “PENDING”. One can call poll on the “ResultUrl” till Job status changes to “COMPLETE”. At that point, the response will contain things like BoundingBox and Labels predicted by PMR-Core. Also, the ResultUrl changes to a link to output image/video with visualization drawn. The changes to Job status are updated in PostgreSQL database. When status is complete, the actual JSON result with predictions is also stored in DB. On polling the ResultUrl, the JSON response is fetched from DB.

Here is one such output:

Image 4: Output for RecognizeCelebrities action after polling for Job completion

Note that the ResultUrl points to the visualization shown as“Image 2” above.

For those who are interested, here is the DB schema:

For most cases, Job Status becomes “COMPLETED” within few milliseconds. Why use Polling then? Because if any input (say video or a large image) takes more time for processing, the server will be blocked by that request. It can accept other requests only after sending response for current request. “Dispatch and Poll” ensures the server is always available for accepting new requests. Again a very unconventional approach but essential for a CPU intensive tasks like ours. Another output on an image with more faces:

Image 5: Output of celebrity recognition using Facenet and KNN-Classifier.

Labels are mostly incorrect owing to small data available to KNN. Though it correctly classified “Jennifer Aniston” but confidence score is just 25% because there is only one sample of the actress and K = 4.

At the time of writing this, DetectFaces and RecognizeCelebrities are functional actions provided by the server. I also added video (CelebrityInVideo action) support but only JSON reponse is generated, the video output fails to write because of encoding issues with my OpenCV installation.

Future work

Future work will include switching to a better classifier and/or using clustering on a large dataset like YouTube Faces or MS Celebs. Mostly, a single dataset will not suffice. I am also working on deploying the project to AWS for the past few days but have not yet completed doing so in a cost effective manner and in a way that does not break the codebase. After deployment, I will continue working on this project in free time. I believe, Nodoface alone will help many people develop their own services. Also, as more features are added to PMR-Server, it will need to be divided into micro-services. It’s a lot of work but the fruits will be reaped by many hopefully!

More info

You can read my notes here to know technical details and problems I faced along this journey.