In part one of this tutorial series, I covered how to implement the Viola-Jones image classification method which was introduced by Paul Viola and Michael Jones in 2001. To briefly recap, Viola-Jones is an ensemble method which uses a series of weak classifiers to create a strong classifier. The output of the algorithm is a weighted combination of the predictions made by each weak classifier.
When we trained our Viola-Jones classifier in part one of the tutorial, we used 19 x 19 images. However, in the real world, we aren’t given a 19 x 19 image and asked whether or not there is a face in it. Instead, we often have large images and many faces like this photo of former President Obama giving an address on the economy.
In order to detect all the faces in the image, a face-detection system might slide a 19 x 19-pixel window over each location in the image and run the pixels in that image through Viola-Jones. For a large image, and considering the image will have to be scaled and rescaled to account for differently sized faces, this amounts to using Viola-Jones a large number of times. If the system needs to detect faces in real-time, the computation needs to be fast. It is for this purpose that Viola and Jones introduced the “Attentional Cascade”
The “Attentional Cascade”
The attentional cascade uses a series of Viola-Jones classifiers, each progressively more complex, to classify an image. An image is only put through by the nth classifier if the n-1th classifier classifies it as a positive example. If at any point a classifier does not think the image is a positive example, the cascade stops.
For example, a cascade might contain classifiers considering 1 feature, 5 features, 10 features, 50 features, and 100 features in that order. The benefit of introducing this structure is to weed out negative examples early on. This significantly reduces computation time for finding negative examples in large images since only a fraction of the features are actually computed for negative examples.
The training of each classifier in the cascade is mostly the same as the training of the regular Viola-Jones algorithm. The only difference is that after the first classifier, which trains on all training examples, each subsequent classifier is trained on all the positive examples and only the negative examples which the previous classifier misclassified (the false positives). Thus each classifier in the cascade focuses on “more difficult” features. The eventual outcome of introducing the cascade is that the false positive rate (how many non-faces are classified as faces) is dramatically reduced.
Building the Algorithm
In their paper, Viola and Jones introduce an algorithm which gives fine control over the exact false positive rate of the resulting cascade.
However, for the sake of simplicity, we’ll modify the algorithm to allow the user to design each layer of the cascade individually (i.e choose how many features each layer has)
Let's start by designing the class.
def __init__(self, layers):
self.layers = layers
self.clfs = 
self.layers will be an array of integers such as [1, 5, 10, 50] which defines how many features each layer in the cascade has.
self.clfs will store each strong classifier.
Our train method will look exactly as you would expect, making sure to keep track of the false positives.
def train(self, training):
pos, neg = , 
for ex in training:
if ex == 1:
neg.append(ex) for feature_num in self.layers:
if len(neg) == 0:
print("Stopping early. FPR = 0")
break clf = ViolaJones(T=feature_num)
clf.train(pos+neg, len(pos), len(neg))
false_positives = 
for ex in neg:
if self.classify(ex) == 1:
neg = false_positives
The classify method is also relatively simple.
def classify(self, image):
for clf in self.clfs:
if clf.classify(image) == 0:
Finally, add the save and load methods which we used for Viola-Jones
def save(self, filename):
with open(filename+".pkl", 'wb') as f:
with open(filename+".pkl", 'rb') as f:
The attentional cascade can be tested in the same way that Viola-Jones was tested.
The attentional cascade is a simple idea that focuses heavily on reducing the false positive rate. It speeds up classification time but may increase training time depending on how many features each strong classifier takes into account. Regardless, it is a useful structure that is not specific to Viola-Jones.
This concludes my two-part tutorial on the Viola-Jones algorithm. You can find the full code on my GitHub. Thanks for reading!