The building blocks of Object Detection (1/n)
The article aims to discuss a revolutionary Computer Vision algorithm — Voila Jones detection which forms the basis for the intuition for convolution operations on CNNs and advanced deep neural nets.
The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although the original training was motivated from face detection problem, it can be scaled for other objects. The algorithm isn’t very commonly used primarily because of its huge number of features but forms basis for some of the advanced computer vision networks.
At a high level, the classifier takes data consisting of faces and no faces as positive and negative.
Haar features are similar to convolutional kernels and forms basis for convolution. There are 5 basic types of cascades that convolve over image in windows of 24*24 in increasing size and shape.
The result in a single 24*24 window with all haar feature extraction alone gives over 160,000+ features!
Since the algorithm uses calculations over large arrays, the calculations are simplified through a divide and conquer approach by using the concept of integral image.
Improvement : Eliminate redundant features through AdaBoost and bring the result down to 7000 features per window.
AdaBoost combines weak classifiers(relevant, better than random) to produce a strong classifier by using a linear combination of weak classifiers to decide if image has a face or not.
The algorithm is further improved through cascading the strong classifiers (building a decision tree) to avoid redundancy in sliding calculations and eliminate non-face pixels.