BRAN — A facial recognition buddy at our office entrance

Shankar M S
KI labs Engineering
10 min readOct 2, 2019
BRAN in action

If you get your face and your name out there enough, people will start to recognize you.

- RICHARD ‘BRAN’SON

BRAN — The Beginning

In this article, we present BRAN 1.0

“You shall not pass” — G̶a̶n̶d̶a̶l̶f̶ ̶t̶h̶e̶ ̶G̶r̶e̶y̶ .Bran the Broken

Basic Recognition and Authentication at eNtrance. As the name suggests, BRAN is an identification and authentication system mounted at the entrance of our cool KI labs office, here in Munich. BRAN uses facial recognition algorithms in order to identify and allow authorized personnel into our office premises.

BRAN, in essence, consists of these components:

BRAN is also our first attempt at understanding and evaluating simple face detection algorithms such as Haar Cascade Classifiers to more complex ones such as HOG, Linear SVM and CNNs.

BRAN — The Origin

Like all super-heroes, our BRAN has an origin story too.

Not so long ago, our office building had seen our beloved Hodor, evolving from a Slack-driven door-opening application into a much fancier, voice-activated (SIRI) mobile application. A detailed account of its evolution can be found in the following article:

BRAN, in reality, was conceptualised during a hackathon which was held here at KI labs. The idea was to leverage our existing door-opening application Hodor, with an additional facial recognition layer on top of it, which would identify and authorize visitors into our office. As it was quite figuratively an all-seeing eye sitting on top of Hodor, we found it only fair and poetic to call it BRAN, the Three-eyed raven from the GoT universe. Surprise, surprise, we ended up winning the hackathon (well, it was a joint-first place #nuff-braggin’). Nevertheless, it was still an amazing feeling, ‘cause all the participating teams had worked on equally cool projects (Check out this article on Beerbot, one of the other prize-winning hackathon projects).

BRAN — The Fellowship

The ‘brain’ behind BRAN — #Team-BRAN

BRAN — The Reckoning

When we decided to ‘face’ this interesting problem of facial recognition using computer vision, our initial research pointed us towards a couple of famous, ready-to-use, out-of-the-box (quite literally :D) solutions, but the drawbacks of each of these were considered and we arrived at a simple, but yet a powerful solution using a cheap Raspberry Pi along with a normal Pi camera module.

Considered options and their drawbacks:

BRAN — An Overview

BRAN — The Building

BRAN essentially is a Python + OpenCV based facial recognition service running on a Raspberry Pi. If you want to skip this section and dive directly into the code, the repository dedicated to BRAN (which has now been open-sourced) can be found in the following link:

BRAN — Three Eyed Raven

In order to run BRAN on a Raspberry Pi, we must install OpenCV and a few other system dependencies. This can be achieved simply by running the command

$ make setup

The above command automatically creates a virtual environment for you and installs all requirements from pip. Some of the important libraries that need to be highlighted are:

  • opencv-contrib-python: A library, simply called OpenCV consisting of programming functions mainly aimed at real-time computer vision
  • dlib: A toolkit for making real-world machine learning and data analysis applications
  • face_recognition: A library built using dlib that allows deep learning for images and enables real-time face recognition
  • imutils: A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python

BRAN — The Gathering

This step involves the collection of images dataset. In other words, the images of all the faces that we want BRAN to recognize.

Registered Faces

The directory called dataset in the root of the repository is used to collect, classify and maintain all the images of the people that work in our office. Each of the sub-directories in the dataset folder is named after the people whose images are contained within it. In facial recognition parlance, these known users, whose full frontal profile of faces are used to create the stored facial embeddings are called registered users. The idea is to authorize only these users and disallow access to all other unregistered faces.

BRAN — The Encoding

128-d real-valued facial embedding vector for each face

As illustrated in the figure above, each of the faces captured in the frames/images at the entrance must be compared with a set of known faces of the registered users. Each of the face that is stored in the dataset is converted into a 128-d vector and is represented by a list of 128 floating-point values. These 128-d embeddings of all the registered users are stored in BRAN’s internal database as a pickle file. With the help of a voting method, these stored values are consequently matched by calculating the Euclidean distance between them and the set of embeddings that are computed in real-time, from the video frames captured at the entrance¹. The voting method determines whether or not the generated encodings can safely be called as a ‘match’ to any one of the stored set of registered encodings and then fires a post request to Hodor, which subsequently opens the door.

There are two different algorithms that can be used to perform facial detection, which are:

  • Hog Classifier method (hog): Histogram of Gradients method is based on the first-order image gradients which have different orientations and are densely bucketed into overlapping bins. Since in HOG only the low-level features are considered and are processed only in a single layer the face detection and the bounding box determination process is not very resource-intensive.
bran encode 
--dataset dataset
--encodings encodings.pickle
--detection-method hog
  • Convolutional Neural Networks method (CNN): CNN is a hierarchical deep learning paradigm that is multi-layered wherein the result of the same sequence of operations are repeated in each layer but the result of each stage is filtered and transferred to the next. The use of the CNN method gives more reliable bounding boxes but it comes with expected trade-offs. Apart from being more resource-intensive, it also requires considerably higher computation time (Approximately six times slower :/). The benchmarking of their performance has been thoroughly researched and demonstrated in the benchmarking_detection_methods.ipynb notebook.
bran encode 
--dataset dataset
--encodings encodings.pickle
--detection-method cnn

The above bran encode command is used to specify the path where we would like to store our serialized database of facial embeddings as a pickle file (list of registered embeddings and registered names) for the collected dataset folder whose path is provided by the -dataset option. The -detection-method option is used to specify which model is to be used to detect the facial bounding boxes from which the facial encodings should be generated and stored as a pickle file.

Since running the compute-intensive CNN model for encoding on a Raspberry Pi is not practically viable, it is advised that this encoding method is used only on a reasonable environment with sufficient resources (laptops or GPU enabled clusters for large datasets). Therefore, the Hog classifier method was chosen as the default encoding option in BRAN but it also comes with a configurable option that allows us to use the CNN model on a favourable environment if required.

BRAN — The Detection

Next comes the exciting part of actually seeing BRAN being the Three-eyed Raven. The process of detecting faces in real-time is invoked by running the bran detect command which switches on the camera and starts the video feed. This video is processed by BRAN as frames and the presence of faces, if any, is detected and localised. This is achieved by using a relatively simple but yet an effective classifier method called the Haar cascade classifier (Viola-Jones algorithm)², that also satisfies the Pi’s memory limitations to perform the facial detection and recognition in real-time. This would have been otherwise impossible if other complex deep learning methods were to be employed for real-time face detection on a Raspberry Pi.

We can possibly choose from two different methods to perform the face matching and is completely configurable using the -m option when bran detect is initiated.

  • Euclidean Distance — Maximum Votes:
bran detect 
-c haarcascade_frontalface_default.xml
-e encodings.pickle
-p shape_predictor_68_face_landmarks.dat
-m dist_vote
-t 0.5
-f

As the name suggests, in this recognition a detected face is recognized as one of the registered faces which had the highest votes of being matched. During the identification of a face, this is especially effective to determine which person in the dataset has the most matches (The first entry in the pickle file is selected in event of a tie).

  • Euclidean Distance — Min (Average Euclidean Distance)

In contrast to the previous mechanism, this involves calculating the average Euclidean distance between the detected face embeddings and each of the stored facial embeddings. The detected face is then matched with a registered face which has the least average Euclidean distance

bran detect 
-c haarcascade_frontalface_default.xml
-e encodings.pickle
-p shape_predictor_68_face_landmarks.dat
-m dist_vote
-t 0.5
-f
  • Custom Model (e.g. Multi-layer perceptron aka Deep Neural Network)

There is an additional provision to implement custom DNN models which could also be configured using the same -m option and a custom argument.

bran detect 
-c haarcascade_frontalface_default.xml
-e encodings.pickle
-p shape_predictor_68_face_landmarks.dat
-m custom
-k models/mlp.pickle
-t 0.9
-f

The -t option is used to specify the required confidence measure for successful recognition and matching.

The -f option toggles the camera input based on the type of camera that is being used for capturing the feed. For normal USB camera this option need not be set as it defaults to False but has to be included in case the detection requires the use of a Pi Camera module.

BRAN — The Sceptic

In the detection step, there is an additional -l option which can be set to specify the liveness detection type. A liveness detection type is essential to overcome the possibilities of spoofing. An imposter or a fake known face who uses a picture of a registered user cannot possibly fool BRAN due to the cleverly designed liveness detection checks it comes powered with. The currently available options are both smile and blink. If left unset, the option defaults to smile and expects a known face to have a smile for successful authentication. Yes, we want people to enter our office with a smile :)

BRAN is also cognizant enough to distinguish between real moving smiles/blinks and static smiles/static blinks (eyes closed). The only way BRAN can be fooled is when an imposter uses a video recording of a registered user who is also smiling and/or blinking in the video. Although this is highly unlikely, it is however not impossible. Therefore to avoid such misidentifications/blunders, apart from the classic liveness detection using smile and blink, there are a couple of additional nifty security checks that make it almost impossible for any spoofing/faking techniques, although it comes with associated processing overhead.

  • Blink pattern detection: Set and expect a personalised blink password (blink pattern) for each registered user.
  • Smile pattern detection: Set and expect a personalised smile password (smile pattern) for each registered user.
  • Closest Face first: In order to avoid illegal tail-gating, BRAN ensures to authenticates and authorize only the person closest to the screen to enter the office building.

BRAN 2.0 — The Evolution

  • Remote/On Cloud DNN based facial recognition: Currently, the facial detection is being performed using a Haar Cascade Classifier on a Raspberry Pi with insufficient computing resources to run Deep Neural Network models such as CNN, Eigenfaces, LBPs or any other custom models. An advancement to BRAN would be to stream in the captured video and perform the facial detection/recognition on more powerful computing servers present remotely or on the cloud.
  • PIR sensor (Passive InfraRed Sensor) Boost: An electronic PIR sensor that measures infrared (IR) light radiating from objects in its field of view could be included in BRAN, which would detect the presence of a human face of a visitor. This would ensure that the face being recognised is a real one rather than in a moving video used for the sake of spoofing.
  • TFT-LCD display and housing case for Pi: A protective case and fancy display for the Raspberry Pi mounted at the entrance would be a great visual enhancement to the current set up.

BRAN — The Contribution

This was only a hackathon project and a brief glimpse of what could be achieved using some of the interesting concepts and algorithms in computer vision. Therefore, any improvements and suggestions to BRAN is sincerely appreciated and shall be taken into account. We also encourage you to contribute towards this project, which has now been open-sourced and are also eager to receive your feedback/Pull requests. The Github repository can be found below:

BRAN — Three Eyed Raven

Please check out this article about all the other interesting projects that were conceptualised and developed during our hackathon event held in March 2019.

You could also visit our website, whose link is provided below to know more about the interesting products and cutting-edge technologies/tools that we work on.

KI labs — A Sneak Peek

References:

1: FaceNet: A Unified Embedding for Face Recognition and Clustering

2: Rapid Object Detection using a Boosted Cascade of Simple Features

Credits: The problem of facial recognition has been elaborately researched and discussed in PyImageSearch, whose insightful articles on this topic have been our primary source of inspiration and reference for us to undertake this seemingly interesting and yet a practically challenging task.

--

--