# Understanding and Implementing the Viola-Jones Image Classification Algorithm

Jan 17, 2019 · 16 min read

# Features and the Integral Image

One of the first key contributions made in the paper introducing Viola-Jones was a set of simple features to use in image recognition. In most tasks, the pixel values are the features inputted into the algorithm. However, Viola and Jones introduced the following new features.

`import numpy as np #Don't forget to import numpydef integral_image(image):    ii = np.zeros(image.shape)    s = np.zeros(image.shape)    for y in range(len(image)):        for x in range(len(image[y])):            s[y][x] = s[y-1][x] + image[y][x] if y-1 >= 0 else image[y][x]            ii[y][x] = ii[y][x-1]+s[y][x] if x-1 >= 0 else s[y][x]    return ii`
`class RectangleRegion:    def __init__(self, x, y, width, height):        self.x = x        self.y = y        self.width = width        self.height = heightdef compute_feature(self, ii):    return ii[self.y+self.height][self.x+self.width] + ii[self.y][self.x] - (ii[self.y+self.height][self.x]+ii[self.y][self.x+self.width])`

# Viola-Jones

Now that we have set up our helper classes and functions, we can start creating the Viola-Jones algorithm. Let's start by defining a ViolaJones class. The only hyper-parameter which our algorithm will use is the number of features (a.k.a the number of weak classifiers)

`class ViolaJones:    def __init__(self, T = 10):        self.T = T`
1. Normalize the weights
2. Select the best weak classifier (based on the weighted error)
3. Update the weights based on the chosen classifiers error
4. Repeat steps 2–4 T times where T is the desired number of weak classifiers
`def train(self, training):    training_data = []    for x in range(len(training)):        training_data.append((integral_image(training[x][0]), training[x][1]))`

## Initializing the weights

At the start of the algorithm, we have no error off of which to base the weights, so each training example of the same class is weighted the same (i.e all positive examples are equally weighted as are all negative examples). This means for the ith image

`def train(self, training, pos_num, neg_num):    weights = np.zeros(len(training))    training_data = []    for x in range(len(training)):        training_data.append((integral_image(training[x][0]), training[x][1]))        if training[x][1] == 1:            weights[x] = 1.0 / (2 * pos_num)        else:            weights[x] = 1.0 / (2 * neg_num)`

## Building the Features

The main loop of training requires choosing the best weak classifier, but there is one weak classifier for each possible feature. Because of this, we have to build all of the features before we start implementing the main loop of training. Recall that features look like the following.

`def build_features(self, image_shape):    height, width = image_shape    features = []    for w in range(1, width+1):        for h in range(1, height+1):            i = 0            while i + w < width:                j = 0                while j + h < height:                    #2 rectangle features                    immediate = RectangleRegion(i, j, w, h)                    right = RectangleRegion(i+w, j, w, h)                    if i + 2 * w < width: #Horizontally Adjacent                        features.append(([right], [immediate]))                    bottom = RectangleRegion(i, j+h, w, h)                    if j + 2 * h < height: #Vertically Adjacent                        features.append(([immediate], [bottom]))                    right_2 = RectangleRegion(i+2*w, j, w, h)                    #3 rectangle features                    if i + 3 * w < width: #Horizontally Adjacent                        features.append(([right], [right_2, immediate]))                    bottom_2 = RectangleRegion(i, j+2*h, w, h)                    if j + 3 * h < height: #Vertically Adjacent                        features.append(([bottom], [bottom_2, immediate]))                    #4 rectangle features                    bottom_right = RectangleRegion(i+w, j+h, w, h)                    if i + 2 * w < width and j + 2 * h < height:                        features.append(([right, bottom], [immediate, bottom_right]))                j += 1            i += 1    return features`

## Applying the features

When we are finding the optimal weak classifiers to use later in the algorithm, it is going to require evaluating each feature for every training example. To save computation, we’ll do this before we start training the classifiers. This is more efficient because each classifier need to be retrained every time we select a new one. If we were to apply the features to the training examples in the training loop, we would be repeating work because the value of each feature for the images never changes. Applying the features before training will also allow us to pre-select features later on to speed up training. Keeping a separate array holding the label of each training example will simplify our code, so we will create that array during this step as well. Add the `apply_features` method to the ViolaJones class.

`def apply_features(self, features, training_data):    X = np.zeros((len(features), len(training_data)))    y = np.array(map(lambda data: data[1], training_data))    i = 0    for positive_regions, negative_regions in features:        feature = lambda ii: sum([pos.compute_feature(ii) for pos in positive_regions]) - sum([neg.compute_feature(ii) for neg in negative_regions])        X[i] = list(map(lambda data: feature(data[0]), training_data))        i += 1    return X, y`
`def train(self, training, pos_num, neg_num):    weights = np.zeros(len(training))    training_data = []    for x in range(len(training)):        training_data.append((integral_image(training[x][0]), training[x][1]))        if training[x][1] == 1:            weights[x] = 1.0 / (2 * pos_num)        else:            weights[x] = 1.0 / (2 * neg_num)    features = self.build_features(training_data[0][0].shape)    X, y = self.apply_features(features, training_data)`

## The Weak Classifiers

We finally have all of the pieces to start building and training the weak classifiers. Recall that Viola-Jones uses a series of weak classifiers and weights their results together to produce the final classification. Each weak classifier is “weak” because by itself, it cannot accurately fulfill the classification task. Each weak classifier looks at a single feature( f ). It has both a threshold ( θ ) and a polarity ( p ) to determine the classification of a training example.

`class WeakClassifier:    def __init__(self, positive_regions, negative_regions, threshold, polarity):        self.positive_regions = positive_regions        self.negative_regions = negative_regions        self.threshold = threshold        self.polarity = polaritydef classify(self, x):        feature = lambda ii: sum([pos.compute_feature(ii) for pos in self.positive_regions]) - sum([neg.compute_feature(ii) for neg in self.negative_regions])        return 1 if self.polarity * feature(x) < self.polarity * self.threshold else 0`

## Training the Weak Classifiers

Training the weak classifiers is the most computationally expensive piece of the algorithm because each time a new weak classifier is selected as the best one, all of them have to be retrained since the training examples are weighted differently. However, there is an efficient way to find the optimal threshold and polarity for a single weak classifier using the weights. First, sort the weights according to the feature value that they correspond to. Now iterate through the array of weights, and compute the error if the threshold was chosen to be that feature. Find the threshold and polarity with the minimum error. The possible values for a threshold are the values of the feature on each training example. The error can be measured by

`def train_weak(self, X, y, features, weights):    total_pos, total_neg = 0, 0    for w, label in zip(weights, y):    if label == 1:        total_pos += w    else:        total_neg += w    classifiers = []    total_features = X.shape[0]    for index, feature in enumerate(X):        if len(classifiers) % 1000 == 0 and len(classifiers) != 0:            print("Trained %d classifiers out of %d" % (len(classifiers), total_features))        applied_feature = sorted(zip(weights, feature, y), key=lambda x: x[1])        pos_seen, neg_seen = 0, 0        pos_weights, neg_weights = 0, 0        min_error, best_feature, best_threshold, best_polarity = float('inf'), None, None, None        for w, f, label in applied_feature:            error = min(neg_weights + total_pos - pos_weights, pos_weights + total_neg - neg_weights)            if error < min_error:                min_error = error                best_feature = features[index]                best_threshold = f                best_polarity = 1 if pos_seen > neg_seen else -1            if label == 1:                pos_seen += 1                pos_weights += w            else:                neg_seen += 1                neg_weights += w         clf = WeakClassifier(best_feature[0], best_feature[1], best_threshold, best_polarity)         classifiers.append(clf)    return classifiers`

## Selecting the best weak classifier

Once we have trained all of the weak classifiers, we can now find the best one. That is as simple as iterating through all the classifiers and calculating the average weighted error of each one. Add the `select_best` method to the ViolaJones class

`def select_best(self, classifiers, weights, training_data):    best_clf, best_error, best_accuracy = None, float('inf'), None    for clf in classifiers:        error, accuracy = 0, []        for data, w in zip(training_data, weights):            correctness = abs(clf.classify(data[0]) - data[1])            accuracy.append(correctness)            error += w * correctness        error = error / len(training_data)        if error < best_error:            best_clf, best_error, best_accuracy = clf, error, accuracy    return best_clf, best_error, best_accuracy`
`def train(self, training, pos_num, neg_num):    ...    for t in range(self.T):        weak_classifiers = self.train_weak(X, y, features, weights)        clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)`

## Updating the Weights

The bulk of the work in Viola-Jones goes to building the features, training the classifiers, and choosing the best weak classifier in each iteration. The rest of the train method is relatively simple. Before choosing the best classifier, we should normalize the weights. After choosing the best weak classifier, we have to update the weights with the error of the chosen weak classifier. Training examples which were classified correctly will get smaller weights, examples which are classified incorrectly will have their weights stay the same.

`def train(self, training, pos_num, neg_num):    ...    for t in range(self.T):        weights = weights / np.linalg.norm(weights)        weak_classifiers = self.train_weak(X, y, features, weights)        clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)        beta = error / (1.0 - error)        for i in range(len(accuracy)):            weights[i] = weights[i] * (beta ** (1 - accuracy[i]))`

## The Strong Classifier

Finally, we have to compile the strong classifier out of our weak classifiers. The strong classifier is defined as

`def __init__(self, T = 10):    self.T = T    self.alphas = []    self.clfs = []`
`def train(self, training, pos_num, neg_num):    ...    for t in range(self.T):        weights = weights / np.linalg.norm(weights)        weak_classifiers = self.train_weak(X, y, features, weights)        clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)        beta = error / (1.0 - error)        for i in range(len(accuracy)):            weights[i] = weights[i] * (beta ** (1 - accuracy[i]))        alpha = math.log(1.0/beta)        self.alphas.append(alpha)        self.clfs.append(clf)`
`def train(self, training, pos_num, neg_num):    weights = np.zeros(len(training))    training_data = []    for x in range(len(training)):        training_data.append((integral_image(training[x][0]), training[x][1]))        if training[x][1] == 1:            weights[x] = 1.0 / (2 * pos_num)        else:            weights[x] = 1.0 / (2 * neg_num)    features = self.build_features(training_data[0][0].shape)    X, y = self.apply_features(features, training_data)    for t in range(self.T):        weights = weights / np.linalg.norm(weights)        weak_classifiers = self.train_weak(X, y, features, weights)        clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)        beta = error / (1.0 - error)        for i in range(len(accuracy)):            weights[i] = weights[i] * (beta ** (1 - accuracy[i]))        alpha = math.log(1.0/beta)        self.alphas.append(alpha)        self.clfs.append(clf)`
`def classify(self, image):    total = 0    ii = integral_image(image)    for alpha, clf in zip(self.alphas, self.clfs):        total += alpha * clf.classify(ii)    return 1 if total >= 0.5 * sum(self.alphas) else 0`

## Improvement

The number of features grows incredibly fast. For a 24x24 image, there are over 160,000 features, and for a 28x28 image, there are over 250,000. Moreover, many of these features won’t offer much classification power and are mostly useless. It would be useful to remove as many features as possible so we can train fewer WeakClassifiers. Fortunately, the SciKit-Learn package can help us narrow down our feature space with its SelectPercentile class. To take advantage of this, add the following to the `train` method and import the required classes.

`from sklearn.feature_selection import SelectPercentile, f_classifdef train(self, training_data, pos_num, neg_num):    ...    X, y = self.apply_features(features, training_data)    indices = SelectPercentile(f_classif, percentile=10).fit(X.T, y).get_support(indices=True)    X = X[indices]    features = features[indices]    ...`

Since ViolaJones takes a long time to train, it would be very useful to be able to save and load the model from a file. This is easy with the Pickle module. Add the following to the ViolaJones class

`def save(self, filename):    with open(filename+".pkl", 'wb') as f:        pickle.dump(self, f)@staticmethoddef load(filename):    with open(filename+".pkl", 'rb') as f:        return pickle.load(f)`

# Testing

ViolaJones was created for the task of face detection, so what better way to test our code on the same task. A good dataset to test this is the CBCL Face Database created by MIT’s Center For Biological and Computation Learning. The dataset contains a training set with over 2000 face images and over 4500 non-face images. Each image is 19x19 and greyscale. The original dataset has each image in a .pgm format, so I have created two Pickle files, one for training and one for testing that will allow us to directly load them into our ViolaJones class. You can download them here on GitHub.

`def train(t):    with open("training.pkl", 'rb') as f:        training = pickle.load(f)    clf = ViolaJones(T=t)    clf.train(training, 2429, 4548)    evaluate(clf, training)    clf.save(str(t))def test(filename):    with open("test.pkl", 'rb') as f:        test = pickle.load(f)    clf = ViolaJones.load(filename)    evaluate(clf, test)def evaluate(clf, data):    correct = 0    for x, y in data:        correct += 1 if clf.classify(x) == y else 0    print("Classified %d out of %d test examples" % (correct, len(data)))train(10)test("10")`

# Final Remarks

Viola-Jones implements several concepts important to Machine Learning and Artificial Intelligence. First, it is an ensemble method. Second, it makes use of a variant of the Adaboost algorithm for feature selection. These characteristics have made it a strong method for image classification that was used in the past and continues to be used. The original paper also introduced the idea of an “attentional cascade” to greatly reduce classification time for negative examples. Learn more about that in Part Two of this tutorial.

## Data Driven Investor

### By Data Driven Investor

Our Editor's Selection  Take a look

Written by

Written by

## More From Medium

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just \$5/month. Upgrade