Face Detection and Tracking for Real-Time apps

Published in

Lion IQ

3 min readAug 8, 2018

Modern object and face detection methods use computationally expensive deep learning models to perform detection within an image. There is no free lunch however — faster detectors are generally sacrifice precision for performance.

While the performance of detectors such as YOLO allows it to be applied in real-time, this comes at a cost:

Computational expense — we are all slaves to GPU.
Sacrifice precision for speed.
Bounding boxes returned by detectors will “bounce” a lot even if the subject is not moving

There is no free lunch — fast detectors attain lower precision

For image processing we can get around this by spinning up multiple servers and process large image batches in parallel, essentially trading money for speed. For real-time or video applications however, we’ll need a different approach.

Object detection and object tracking are two different well established tasks within computer vision:

Object Detection is the process of finding instances of semantic objects, e.g. Humans, Cars, Animals.
Object Tracking deals with the tracking of a specific instance as it moves around in video or between images.

In many applications, semantic instances within an view frame are unlikely to change significantly and we can instead track its location using object tracking. (A counter example would be self driving cars — a car driving in opposite lane suddenly changing directions may require different reactions and is semantically very different.)

Advantages of Object Tracking

Tracking is faster. Intuitively, knowing the speed and previous locations of an object we can infer its current position, regardless if the object is a car or a building.
Tracking preserves identity. In some applications, we may want to track specific object instances, e.g. track a person with specific identification. Detection cannot differentiate between instances.

Object Tracking with OpenCV

OpenCV since version 3.1 has a variety of object tracking algorithms available. They are fast, run on regular CPU, and run without additional installation. Platform specific installation for OpenCV, and individual tracker details are not discussed here.

To use:

import cv2# arbitrarily pick KCF Tracker
tracker = cv2.TrackerKCF_create()# initialize tracker with a single image frame from video and a bounding box (from detector)
tracker.init(frame, box)# predict box coords in next frame
box = tracker.update(frame)

Demo

In this example we make a simple face detection app that demonstrates a detector being run every ~6 seconds, and using trackers for the frames in between.

For face detection in front of a webcam, we can run detector periodically, as the object of interest doesn’t change semantically (still the same face), and then follow the trajectory of the face as it moves across the screen using trackers.

twairball/face_tracking

face_tracking - Face Detection and Tracking from webcam, using OpenCV

github.com

Pipeline

Run detector, get boxes (Green)
Track boxes for each frame (Blue)
Update detector periodically or on-demand, and re-create trackers for each box.

Results

The results from the demo show that we can use the detector once, initialize tracker, and then track the object as it moves around the screen, rather than continuously run detector for every frame.

If we rely on detector only, we may update the bounding box location every 200ms. That’s 300 calls per minute.

Using detector + tracking method demonstrated in the repo above, we can run detector every 6 secs, and update bounding box location with tracker much quicker. That’s 10 detector calls per minute. Boom — 97% reduction in detector usage!