Haar Cascade Classifiers in OpenCV Explained Visually.

In this article, you will learn how haar cascade classifiers really work through python visualization functions.

Mahmoud Harmouch
The Startup
7 min readJul 3, 2020

--

👉 Viola-Jones Algorithm

Normally, every algorithm has a set of instructions, something without which this algorithm couldn’t exist on a basic level, and its remainder is as of now based on its instructions. In the Viola-Jones algorithm, these instructions are made up of Haar signs, which are a set of rectangular kernels:

Types of Haar Features

In the earlier version of the Viola-Jones algorithm, only signs without rotations were used, and in order to calculate the value of the result, the sum of the pixel intensity of one subregion was subtracted from the sum of pixel intensity of another subregion [1].

Haar Features without rotation

In the development phase of the method, signs with a rotation angle of 45 degrees and asymmetric configurations were proposed. Also, instead of calculating the normal difference, it was proposed to assign a certain weight to each subregion and calculate the corresponding values ​​as a weighted sum of pixels of different types of regions [2]:

Where the weights(Wi), the rectangles, and N are arbitrarily chosen.

To decide the class in each cascade, there is a sum of the quantities ​​of the weak classifiers of this cascade. Each weak classifier gives two quantities, depending upon whether the value of the attribute belonging to this classifier is greater or less than a given threshold.

Where w1 is a feature weight, norm(i, j) is the norm factor (standard deviation on the rectangle containing all features), threshold(t), is a parameter of the classifier.

For fast calculation, the integral image method is used.

Integral image example

let's take a matrix of size 6 x 6 representing an image, as shown below:

6 x 6 matrix(source)

Now, let’s say we want to calculate the average pixel value over the area highlighted in purple.

The normal calculation for the average would be:

15 + 16 + 14 + 28 + 27 + 11 = 111
avg = 111 / 6 = 18.5

That required a total of 6 operations( 5 additions and 1 division).

Doing the same for 1000 such operation, for example, would require

1000 * 6 = 6000 operations.

Now, let us compute the integral image of the above matrix using the following formula:

where i(x,y) is the value of the pixel at (x,y).

This formula is a cumulative addition of values of pixels at (x’,y’) in both horizontal and vertical axis.

integral image(source)

Again, to compute the average intensity, all you have to do is

(101 + 450) - (254 + 186) = 111avg = 111/6 = 18.5

This requires a total of 4 operations( 2 additions, 1 subtraction, and 1 division).

Doing the same for 1000 such operation, for example, would require

1000 * 4 = 4000 operations.

So it reduces the computation by about 33 %.

Just imagine the difference it makes for large images and more operations.

So this is very helpful in computing the Haar-like features.

Each feature is calculated by subtracting the sum of pixels under the white rectangle(computed using the integral image) from the sum of pixels under the black rectangle(also computed using the integral image).

matrix representation of filters(kernels).

Haar classifiers are organized in sequences called stages (classification stages). The stage value is the sum of its classifier values.

In the end, the sum of the values ​​of weak classifiers is compared with the threshold of the cascade, and a decision is made whether the object is found or not by this cascade.

Well, that’s all for cascade classifiers

👉 Implementation and visualization

We already know that haar cascade files are available in OpenCV under XML extension. Let’s take a look at face cascade file:

haarcascade_frontalface_default.xml

At first look, it seems that it is very difficult to read these strange numbers and weird information. In fact, it’s very simple. The first section of the file describes the cascade in the following manner:

  • <stageType>: it tells us that cascades are boosting.
  • <featureType>: type of features: Haar features
  • <height> and <width>:The height and width of filters used by classifiers.
  • <maxWeakCount>: The maximum number of weak classifiers at each stage or level.
  • <stageNum>: The number of levels (25).

So features are small convolution kernels (rectangles) that apply to the image.

convolution kernel
  • <stages> The number of classifiers or number of levels. Each level looks at the activation of its classifiers in such a way a decision is made as to whether the object is in the image or not.
  • <stageThreshold> the threshold that classifiers need to overcome in order to move to the next level.
  • <WeakClassifiers>: set of weak classifiers based on which a decision is made, whether the object is in the image or not.
  • <internalNodes>: contains information about the tree nodes.

a- The first value is “0”: the index of the current node.

b- The second is “-1”: the index of the node that you want to go to, the leaf transition ends when the index becomes smaller than”0".

c- Third is “0”: the number of the rectangular filter (it is located further in the XML file under the features tag)

kernels

d- The fourth is “-3.1511999666690826e-02”: the threshold value of the INIweak classifier.

  • The <leafValues> the value of the Haar attribute. The first value “2.0875380039215088e+00is returned if the convolution result is less than the threshold of the tree, otherwise, the second value is returned

The tag <rects> stores rectangles of the convolution:

a- The first 4 numbers “x1, y1, x2, y2” are the coordinates of opposite vertices of the rectangle

b- The fifth number "color". If the number is negative “-1", then the pixels of this rectangle are subtracted; if positive”3”, they are added.

Now let's read this xml file and visualize some of its kernels.

👉 XML Parsing in Python

XML stands for Extensible Markup Language. It is designed to store and transfer small data and it is used to exchange structured information.

Python allows us to parse and modify an XML document using some classes like MiniDOM(Minimal Document Object Model) XML class.

the following snippet of code allows you to visualize the Haar cascade classifier. It implements only the visualization of single-level cascades.

The program will generate 25 pictures because of 25 stages. You can also visualize the rectangular features which are available under the “feat_mat” variable(2912 matrices).

👉 Conclusion

Python was and still a handy programming language that provides you useful built-in methods to parse and manipulate an XML document in particular.

that’s all for today.

For more information, you can check out my project’s code on Github.

📚literature:

[1] P. Viola and M. Jones. Robust Real-Time Face Detection(2004), International Journal of Computer Vision 57(2)

[2] Lienhart R., Kuranov E., Pisarevsky V, Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection(2003), In: PRS, pp. 297–304

--

--

Mahmoud Harmouch
The Startup

Senior Blockchain Rust Enjoyer at GigaDAO - I occasionally write articles about data science, machine learning and Blockchain in Rust - Currently Writing Books