A Beginners Guide to Computer Vision (Part 4)- Pyramid
Learn about image pyramids and their implementation
Pyramid, a well known shape to mankind from very long time. Well, this article is not about the “Shape Pyramid” but we are going to talk about “Image Pyramids”.
Sometimes to detect an object (like face or car or anything thing of that sort) in an image, we need to resize or sub-sample the image and run the further analysis. In such cases, we maintain a collection of same image with different resolutions. We call this collection as Image Pyramid. The reason behind calling it as a pyramid is when we arrange the images in decreasing order of their resolution, we obtain a shape like pyramid with square base. look at the image below to understand it in more detail.
Area of the new level will be 1/4 of level below it. If the size of base image(High resolution or level 0) is M X N then the size of level above it will be (M/2 X N/2).
There are two kinds of pyramids: 1) Gaussian Pyramid and 2) Laplacian Pyramid.
Gaussian Pyramid
In Gaussian Pyramid, we apply the Gaussian filter of 5 X 5 size before we sub-sample the image. There two kinds of operations here: Reduce and Expand. as name suggests, in reduce operation we will reduce( half of width and height) the image resolution and in expand operation we will in increase (two times the width and height) the image resolution.
Reduce or Down
The reduce operation in Gaussian Pyramids in done according to the relation given below.
where l represents level, w(m,n) is the window function(Gaussian).
The only difference between reduce and convolution operation is, the value of stride in convolution operation is 1 but in case of reduce operation is 2. we convolve the Gaussian mask with every alternate row and column. Look at the code snippet below for understanding operation further.
lowResImage = cv2.pyrDown(highResImage) # Reduce operation of OpenCV
Expand or Up
The expand operation in Gaussian Pyramids in done according to the relation given below.
where l represents level, w(p,q) is the window function(Gaussian).
To understand the above relation more clearly, lets expand the above relation in one dimension case.
Non-integer terms in above equation will be eliminated finally the equation will have only three terms. Look the image below
The terms a,b,c,d and e in the above image are the Gaussian weights in one dimension. In expand operation, new pixels are created from same old pixels with different gaussian weight combinations.
highResImage = cv2.pyrUp(lowResImage) # Expand operation of OpenCV
Laplacian Pyramid
In Gaussian pyramid, we apply the gaussian blur and sub-sample the image. In Laplacian pyramid, we apply the Laplacian of gaussian and sub-sample the image. You can recall this concept of Laplacian of Gaussian from Marr Hildreth edge detector from my previous article.
In practice, we create the Laplacian pyramid from the Gaussian pyramid. look the formula below for more understanding.
Look at the code snippet below to generate a level in laplacian pyramid.
laplaceLevel1 = gaussianLevel1 - cv2.pyrUp(gaussianLevel2)
Yeah that’s the end is article and stay tuned for more on computer vision. Code of Pyramids is in the link below.