For the past year, I have been working with Human Detection Systems for the final year research project of my undergraduate studies. Our team was focused on Human Detection from Live CCTV Camera Feeds. This story is a brief summary and an evaluation about some of the open-source projects and libraries that we have used for human detection.
Human Detection is a branch of Object Detection. Object Detection is the task of identifying the presence of predefined types of objects in an image. This task involves both identification of the presence of the objects and identification of the rectangular boundary surrounding each object (i.e. Object Localisation). An object detection system which can detect the class “Human” can work as a Human Detection System.
I would start-off this story with early approaches for Human Detection, which I would cover in this part of the story. These approaches are originated in the early 2000s. Despite termed “early”, these approaches are still being used in the industry.
In upcoming parts, I would cover Deep Learning based techniques, which is the present state of the art approach.
In later parts, I would discuss two more recent trends Human Pose Estimation (identification of positions of joints of a person) and Human Segmentation (identification of polygonal boundary representing each person).
The mentioned results in this article are obtained using the provided code snippets on a laptop with following specifications.
Intel Core i7 7700 HQ (up-to 3.8 GHz), 16 GB Memory, nVidia Geforce GTX 1060 6GB VGA, Ubuntu 16.04 and Open CV 3.4.
Despite the presence of GPU, no GPU utilization was recorded during the tests. Performance is almost entirely CPU bound.
All the tests are carried out on “TownCentre” test video from “Coarse Gaze Estimation in Visual Surveillance Project” by University of Oxford.
Early approaches for Human Detection
In this section, I would focus on Haar cascade and HOG based approaches for human detection.
Haar Cascades for Human Detection
Haar feature based approach for object detection is proposed by Paul Viola and Michael Jones in their paper “Rapid Object Detection using a Boosted Cascade of Simple Features” published in 2001. This approach is widely used for Face Detection.
OpenCV includes inbuilt functionality to provide Haar cascade based object detection. Pre-trained models provided by OpenCV for “Full Body Detection”, “Upper Body Detection” and “Lower Body Detection” are available here.
This Python code snippet shows application of Haar cascade for Human Detection using Open CV 3.4. It shows a frame time of approximately 90 — 100 milliseconds per frame (equivalent to 11 frames-per-second) in my test bench.
Histograms of Oriented Gradients for Human Detection
HOG pedestrian detection approach is proposed by N. Dalal and B. Triggs in their paper “Histograms of oriented gradients for human detection” published in 2005.
OpenCV includes inbuilt functionality to provide HOG based detection. It also includes a pre-trained model for Human Detection.
This Python code snippet shows application of HOG Human Detection using Open CV 3.4. It shows a frame time of approximately 150–170 milliseconds per frame (equivalent to 6.25 frames-per-second) in my test bench.
Drawbacks of Early Approaches
Listed below are some common drawbacks I noticed when using Haar cascades and HOG for Human Detection. These observations are based on pre-trained models available with Open CV.
These two approaches are not very good in detecting humans in various poses unless multiple models are used to detect humans in each pose. Available pre-trained models with Open CV are trained to identify the standing pose of a person. They perform fairly well on detecting persons from front view and back view. However, detections from side views of persons are generally poor.
False Detections and Duplicate Detections
These early approaches are also susceptible for detecting non human objects as humans. A trade-off between Missed Detections and False Detections can be achieved by adjusting the threshold parameters. Certain false detections (such as the detections on the image below-left) can be avoided by defining thresholds on minimum detection box size.
Duplicate detections may also happen such as the detection in the image below-right. A technique known as non max suppression can be used to avoid certain duplicate detentions as explained in this tutorial.
Unreliable Detection Boundary
The detection boundary provided by Haar cascade and HOG does not tightly fit the detected person. In fact, the margin of the boundary is not consistent between detections. This makes it difficult to derive positions of body parts of a person (say location of feet) using ratios calculated on the detection boundary.
Flickers in Detection
Quite often it is observed that a person detected in one frame is not detected in the following frame and vice versa. Thereby, detentions are susceptible to flickering.
Strengths of Early Approaches
Despite mentioned drawbacks, these approaches are still being used in the industry. They require comparatively less computing power compared to modern deep learning based approaches. (No need of GPUs to work in real-time.) These approaches are readily available in computer vision libraries such as OpenCV, making them attractive first choices.
Most of the issues present in early human detection approaches are fixed in newer deep learning based approaches. I have written the part two of this series analyzing how modern approaches minimize above drawbacks.
Part 2 — “How modern approaches address the draw-backs of earlier approaches for Real-time Human Detection”
- All the tests are carried out on “TownCentre” test video from “Coarse Gaze Estimation in Visual Surveillance Project” by University of Oxford.
- PyImageSearch Tutorial on HOG Person Detection
- Open CV 3.4 Documentation / Tutorial on Haar cascade Face Detection