Image for post
Image for post

Understanding and Implementing the Viola-Jones Image Classification Algorithm

Anmol Parande
Jan 17, 2019 · 16 min read
Image for post
Image for post

Features and the Integral Image

One of the first key contributions made in the paper introducing Viola-Jones was a set of simple features to use in image recognition. In most tasks, the pixel values are the features inputted into the algorithm. However, Viola and Jones introduced the following new features.

Image for post
Image for post
A and B are two-rectangle features, C is a three-rectangle feature, and D is a 4-rectangle feature. Image taken from Original Paper.
Image for post
Image for post
Image for post
Image for post
import numpy as np #Don't forget to import numpydef integral_image(image):
ii = np.zeros(image.shape)
s = np.zeros(image.shape)
for y in range(len(image)):
for x in range(len(image[y])):
s[y][x] = s[y-1][x] + image[y][x] if y-1 >= 0 else image[y][x]
ii[y][x] = ii[y][x-1]+s[y][x] if x-1 >= 0 else s[y][x]
return ii
class RectangleRegion:
def __init__(self, x, y, width, height):
self.x = x
self.y = y
self.width = width
self.height = height
def compute_feature(self, ii):
return ii[self.y+self.height][self.x+self.width] + ii[self.y][self.x] - (ii[self.y+self.height][self.x]+ii[self.y][self.x+self.width])

Viola-Jones

Now that we have set up our helper classes and functions, we can start creating the Viola-Jones algorithm. Let's start by defining a ViolaJones class. The only hyper-parameter which our algorithm will use is the number of features (a.k.a the number of weak classifiers)

class ViolaJones:
def __init__(self, T = 10):
self.T = T
Image for post
Image for post
Visual representation of boosting. (See source article)
  1. Normalize the weights
  2. Select the best weak classifier (based on the weighted error)
  3. Update the weights based on the chosen classifiers error
  4. Repeat steps 2–4 T times where T is the desired number of weak classifiers
def train(self, training):
training_data = []
for x in range(len(training)):
training_data.append((integral_image(training[x][0]), training[x][1]))

Initializing the weights

At the start of the algorithm, we have no error off of which to base the weights, so each training example of the same class is weighted the same (i.e all positive examples are equally weighted as are all negative examples). This means for the ith image

Image for post
Image for post
def train(self, training, pos_num, neg_num):
weights = np.zeros(len(training))
training_data = []
for x in range(len(training)):
training_data.append((integral_image(training[x][0]), training[x][1]))
if training[x][1] == 1:
weights[x] = 1.0 / (2 * pos_num)
else:
weights[x] = 1.0 / (2 * neg_num)

Building the Features

The main loop of training requires choosing the best weak classifier, but there is one weak classifier for each possible feature. Because of this, we have to build all of the features before we start implementing the main loop of training. Recall that features look like the following.

Image for post
Image for post
def build_features(self, image_shape):
height, width = image_shape
features = []
for w in range(1, width+1):
for h in range(1, height+1):
i = 0
while i + w < width:
j = 0
while j + h < height:
#2 rectangle features
immediate = RectangleRegion(i, j, w, h)
right = RectangleRegion(i+w, j, w, h)
if i + 2 * w < width: #Horizontally Adjacent
features.append(([right], [immediate]))
bottom = RectangleRegion(i, j+h, w, h)
if j + 2 * h < height: #Vertically Adjacent
features.append(([immediate], [bottom]))
right_2 = RectangleRegion(i+2*w, j, w, h)
#3 rectangle features
if i + 3 * w < width: #Horizontally Adjacent
features.append(([right], [right_2, immediate]))
bottom_2 = RectangleRegion(i, j+2*h, w, h)
if j + 3 * h < height: #Vertically Adjacent
features.append(([bottom], [bottom_2, immediate]))
#4 rectangle features
bottom_right = RectangleRegion(i+w, j+h, w, h)
if i + 2 * w < width and j + 2 * h < height:
features.append(([right, bottom], [immediate, bottom_right]))
j += 1
i += 1
return features

Applying the features

When we are finding the optimal weak classifiers to use later in the algorithm, it is going to require evaluating each feature for every training example. To save computation, we’ll do this before we start training the classifiers. This is more efficient because each classifier need to be retrained every time we select a new one. If we were to apply the features to the training examples in the training loop, we would be repeating work because the value of each feature for the images never changes. Applying the features before training will also allow us to pre-select features later on to speed up training. Keeping a separate array holding the label of each training example will simplify our code, so we will create that array during this step as well. Add the apply_features method to the ViolaJones class.

def apply_features(self, features, training_data):
X = np.zeros((len(features), len(training_data)))
y = np.array(map(lambda data: data[1], training_data))
i = 0
for positive_regions, negative_regions in features:
feature = lambda ii: sum([pos.compute_feature(ii) for pos in positive_regions]) - sum([neg.compute_feature(ii) for neg in negative_regions])
X[i] = list(map(lambda data: feature(data[0]), training_data))
i += 1
return X, y
def train(self, training, pos_num, neg_num):
weights = np.zeros(len(training))
training_data = []
for x in range(len(training)):
training_data.append((integral_image(training[x][0]), training[x][1]))
if training[x][1] == 1:
weights[x] = 1.0 / (2 * pos_num)
else:
weights[x] = 1.0 / (2 * neg_num)
features = self.build_features(training_data[0][0].shape)
X, y = self.apply_features(features, training_data)

The Weak Classifiers

We finally have all of the pieces to start building and training the weak classifiers. Recall that Viola-Jones uses a series of weak classifiers and weights their results together to produce the final classification. Each weak classifier is “weak” because by itself, it cannot accurately fulfill the classification task. Each weak classifier looks at a single feature( f ). It has both a threshold ( θ ) and a polarity ( p ) to determine the classification of a training example.

Image for post
Image for post
class WeakClassifier:
def __init__(self, positive_regions, negative_regions, threshold, polarity):
self.positive_regions = positive_regions
self.negative_regions = negative_regions
self.threshold = threshold
self.polarity = polarity
def classify(self, x):
feature = lambda ii: sum([pos.compute_feature(ii) for pos in self.positive_regions]) - sum([neg.compute_feature(ii) for neg in self.negative_regions])
return 1 if self.polarity * feature(x) < self.polarity * self.threshold else 0

Training the Weak Classifiers

Training the weak classifiers is the most computationally expensive piece of the algorithm because each time a new weak classifier is selected as the best one, all of them have to be retrained since the training examples are weighted differently. However, there is an efficient way to find the optimal threshold and polarity for a single weak classifier using the weights. First, sort the weights according to the feature value that they correspond to. Now iterate through the array of weights, and compute the error if the threshold was chosen to be that feature. Find the threshold and polarity with the minimum error. The possible values for a threshold are the values of the feature on each training example. The error can be measured by

Image for post
Image for post
Image for post
Image for post
In this example, the numbers indicate the feature values and the size of the bubbles indicates their relative weights. Clearly, the error will be minimized when any feature with a value less than 9 is classified as blue. That corresponds to a threshold of 9 with a polarity of 1.
def train_weak(self, X, y, features, weights):
total_pos, total_neg = 0, 0
for w, label in zip(weights, y):
if label == 1:
total_pos += w
else:
total_neg += w
classifiers = []
total_features = X.shape[0]
for index, feature in enumerate(X):
if len(classifiers) % 1000 == 0 and len(classifiers) != 0:
print("Trained %d classifiers out of %d" % (len(classifiers), total_features))
applied_feature = sorted(zip(weights, feature, y), key=lambda x: x[1])
pos_seen, neg_seen = 0, 0
pos_weights, neg_weights = 0, 0
min_error, best_feature, best_threshold, best_polarity = float('inf'), None, None, None
for w, f, label in applied_feature:
error = min(neg_weights + total_pos - pos_weights, pos_weights + total_neg - neg_weights)
if error < min_error:
min_error = error
best_feature = features[index]
best_threshold = f
best_polarity = 1 if pos_seen > neg_seen else -1
if label == 1:
pos_seen += 1
pos_weights += w
else:
neg_seen += 1
neg_weights += w
clf = WeakClassifier(best_feature[0], best_feature[1], best_threshold, best_polarity)
classifiers.append(clf)
return classifiers

Selecting the best weak classifier

Once we have trained all of the weak classifiers, we can now find the best one. That is as simple as iterating through all the classifiers and calculating the average weighted error of each one. Add the select_best method to the ViolaJones class

def select_best(self, classifiers, weights, training_data):
best_clf, best_error, best_accuracy = None, float('inf'), None
for clf in classifiers:
error, accuracy = 0, []
for data, w in zip(training_data, weights):
correctness = abs(clf.classify(data[0]) - data[1])
accuracy.append(correctness)
error += w * correctness
error = error / len(training_data)
if error < best_error:
best_clf, best_error, best_accuracy = clf, error, accuracy
return best_clf, best_error, best_accuracy
def train(self, training, pos_num, neg_num):
...
for t in range(self.T):
weak_classifiers = self.train_weak(X, y, features, weights)
clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)

Updating the Weights

The bulk of the work in Viola-Jones goes to building the features, training the classifiers, and choosing the best weak classifier in each iteration. The rest of the train method is relatively simple. Before choosing the best classifier, we should normalize the weights. After choosing the best weak classifier, we have to update the weights with the error of the chosen weak classifier. Training examples which were classified correctly will get smaller weights, examples which are classified incorrectly will have their weights stay the same.

Image for post
Image for post
def train(self, training, pos_num, neg_num):
...
for t in range(self.T):
weights = weights / np.linalg.norm(weights)
weak_classifiers = self.train_weak(X, y, features, weights)
clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)
beta = error / (1.0 - error)
for i in range(len(accuracy)):
weights[i] = weights[i] * (beta ** (1 - accuracy[i]))

The Strong Classifier

Finally, we have to compile the strong classifier out of our weak classifiers. The strong classifier is defined as

Image for post
Image for post
def __init__(self, T = 10):
self.T = T
self.alphas = []
self.clfs = []
def train(self, training, pos_num, neg_num):
...
for t in range(self.T):
weights = weights / np.linalg.norm(weights)
weak_classifiers = self.train_weak(X, y, features, weights)
clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)
beta = error / (1.0 - error)
for i in range(len(accuracy)):
weights[i] = weights[i] * (beta ** (1 - accuracy[i]))
alpha = math.log(1.0/beta)
self.alphas.append(alpha)
self.clfs.append(clf)
def train(self, training, pos_num, neg_num):
weights = np.zeros(len(training))
training_data = []
for x in range(len(training)):
training_data.append((integral_image(training[x][0]), training[x][1]))
if training[x][1] == 1:
weights[x] = 1.0 / (2 * pos_num)
else:
weights[x] = 1.0 / (2 * neg_num)
features = self.build_features(training_data[0][0].shape)
X, y = self.apply_features(features, training_data)
for t in range(self.T):
weights = weights / np.linalg.norm(weights)
weak_classifiers = self.train_weak(X, y, features, weights)
clf, error, accuracy = self.select_best(weak_classifiers, weights, training_data)
beta = error / (1.0 - error)
for i in range(len(accuracy)):
weights[i] = weights[i] * (beta ** (1 - accuracy[i]))
alpha = math.log(1.0/beta)
self.alphas.append(alpha)
self.clfs.append(clf)
def classify(self, image):
total = 0
ii = integral_image(image)
for alpha, clf in zip(self.alphas, self.clfs):
total += alpha * clf.classify(ii)
return 1 if total >= 0.5 * sum(self.alphas) else 0

Improvement

The number of features grows incredibly fast. For a 24x24 image, there are over 160,000 features, and for a 28x28 image, there are over 250,000. Moreover, many of these features won’t offer much classification power and are mostly useless. It would be useful to remove as many features as possible so we can train fewer WeakClassifiers. Fortunately, the SciKit-Learn package can help us narrow down our feature space with its SelectPercentile class. To take advantage of this, add the following to the train method and import the required classes.

from sklearn.feature_selection import SelectPercentile, f_classif
def train(self, training_data, pos_num, neg_num):
...
X, y = self.apply_features(features, training_data)
indices = SelectPercentile(f_classif, percentile=10).fit(X.T, y).get_support(indices=True)
X = X[indices]
features = features[indices]
...

Saving and Loading

Since ViolaJones takes a long time to train, it would be very useful to be able to save and load the model from a file. This is easy with the Pickle module. Add the following to the ViolaJones class

def save(self, filename):
with open(filename+".pkl", 'wb') as f:
pickle.dump(self, f)
@staticmethod
def load(filename):
with open(filename+".pkl", 'rb') as f:
return pickle.load(f)

Testing

ViolaJones was created for the task of face detection, so what better way to test our code on the same task. A good dataset to test this is the CBCL Face Database created by MIT’s Center For Biological and Computation Learning. The dataset contains a training set with over 2000 face images and over 4500 non-face images. Each image is 19x19 and greyscale. The original dataset has each image in a .pgm format, so I have created two Pickle files, one for training and one for testing that will allow us to directly load them into our ViolaJones class. You can download them here on GitHub.

def train(t):
with open("training.pkl", 'rb') as f:
training = pickle.load(f)
clf = ViolaJones(T=t)
clf.train(training, 2429, 4548)
evaluate(clf, training)
clf.save(str(t))
def test(filename):
with open("test.pkl", 'rb') as f:
test = pickle.load(f)
clf = ViolaJones.load(filename)
evaluate(clf, test)
def evaluate(clf, data):
correct = 0
for x, y in data:
correct += 1 if clf.classify(x) == y else 0
print("Classified %d out of %d test examples" % (correct, len(data)))
train(10)
test("10")

Final Remarks

Viola-Jones implements several concepts important to Machine Learning and Artificial Intelligence. First, it is an ensemble method. Second, it makes use of a variant of the Adaboost algorithm for feature selection. These characteristics have made it a strong method for image classification that was used in the past and continues to be used. The original paper also introduced the idea of an “attentional cascade” to greatly reduce classification time for negative examples. Learn more about that in Part Two of this tutorial.

Data Driven Investor

from confusion to clarity not insanity

Sign up for DDI Highlights

By Data Driven Investor

Our Editor's Selection  Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Anmol Parande

Written by

Student of Electrical Engineering and Computer Science at UC Berkeley

Data Driven Investor

from confusion to clarity not insanity

Anmol Parande

Written by

Student of Electrical Engineering and Computer Science at UC Berkeley

Data Driven Investor

from confusion to clarity not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store