[CV] 4. Multi-Scale Representation (Gaussian and Laplacian Pyramid)

jun94
jun-devpBlog
Published in
7 min readNov 17, 2020

1. Background for Multi-Scale Representation

In the last article, I mentioned that the Gaussian kernel is important when down-sampling an image but did not explain why. Here, we will take a look at the reason in detail.

1.1 Motivation

Let’s start with the motivation: why the multi-scale representation of an image is important. Think of a filter to detect a house in an image. If an original resolution of image is high, and therefore, if the size of house exceeds the filter size, then consequently the house cannot be detected by the filter. This is illustrated in the bottom row of Fig 1, where filter is depicted as a red circle.

Figure 1. Illustration of the necessity of image downscaling, from [1], [6]

How should we approach to solve the issue?

One way is increasing the filter size large enough to contain all pixels belonging to the house. However, this implies we need to design a new filter with a larger size, as we cannot reuse the existing filter. Simpler solution is to down-scale the original image, which corresponds to decreasing the size of house in the image. By downscaling the image until the house in image is smaller than the filter size, we can detect the house without designing a new filter as is shown in the top row of Fig 1.

Then the question becomes:

How can we downscale the image without losing much of image-structure information?

Downscaling, what we talk about in this article and in the following examples, corresponds to the ‘Resampling’ (might be also called Subsampling). In other words, we select some pixels from the original image and restore a new downscaled image with less image size.

Think of an image with a checkerboard pattern like the one in Fig 2. Our goal is to downscale by selecting some pixels while keeping the pattern in downscaled image. Fig 2 has 4 examples of downscaling where the selected pixels are denoted by a white circle.

Figure 2. VIsual comparison of downscaling with varying sampling periods, from [1], [4]

What we can observe from above is that example (1) and (2) will maintain the checkerboard pattern while (3) and (4) fail to keep the pattern. This is because the sampling period of (3) and (4) is too large to cover the pattern. More specifically, the image size of (3) after downscaling is 2 x 2, and its chosen pixels only lay on black pixels, indicating that the downscaled image (3) will be completely black.

Therefore, it is essential to choose an appropriate sampling period (or sampling ratio) in order to maintain the patterns (image structures) we want.

1.2 Fourier Interpretation: Sampling and Aliasing

To select a suitable sampling period we need to learn first how the sampling works in spatial and frequency domain regarding Fourier Transform.

Figure 3. Illustration of the sampling in spatial and frequency domain, from [1], [4]

As is shown above, the sampling in the spatial domain is the product of signal f and spike function g (which is defined by the sampling period). While multiplying the two, what happens in the frequency domain is to take the Fourier transformed f, and replicate it infinitely many times with shift. Mathematically, this can be represented as follows:

please find more detail here

We have seen how the sampling phase works. The question remaining unanswered is how can we choose a suitable sampling period. To answer this, we, unfortunately, need to study one additional part: Restoration. So far, we sampled points from the original signal (here, assume it to be continuous), meaning we discretized the continuous signal. This is the sampling part illustrated in Fig 4.

Then, one might ask can we restore the original continuous signal f from the sampled discrete signal. Surprisingly, the answer is yes, we can restore it after conducting one step: Cutting out the frequencies by multiplying it with the box filter. Recall that by sampling, the frequency domain for the sampled signal has a magnitude spectrum, which is the sum of infinitely many shifted copies of the original magnitude spectrum. In order to “successfully” restore the original continuous signal f, we need to get first the original magnitude spectrum of f by excluding all copies. Afterward, performing the Inverse Fourier Transform to copy-removed magnitude spectrum will give us the restored continuous signal f.

Figure 4. Illustration of sampling (continuous signal -> discrete) and restoration (discrete -> continuous), from [1], [4]

However, there is a case when we cannot “successfully” restore, and here comes why the sampling period matters. Let’s say f is a continuous signal with the highest frequency 𝛀. In order to recover f, and to answer the question: how to choose suitable sampling rate is that, we need to sample with at least 2𝛀 (by Nyquist theorem). Briefly explaining, the sampling rate of 𝛀 (= sampling period of 1/𝛀) cannot capture frequencies higher than 𝛀 (= period lower than 1/𝛀), meaning it can only capture frequencies ∈ [-𝛀, 𝛀].

please google the Nyquist theorem for more detailed explanation

Now, think of a signal f whose frequency range ∈ [-𝛀’, 𝛀’] when 𝛀’ > 𝛀, and the sampling rate is 𝛀. In this case, the discretized signal f’ cannot realize all frequencies of f but only frequencies in [-𝛀, 𝛀].

And this is where the problem arises. Recall how the magnitude spectrum of sampled signal f’ is structured. It sums up the shifted copies of the magnitude spectrum of f in [-𝛀’, 𝛀’]. However, this time the captured frequency range [-𝛀, 𝛀] is smaller than [-𝛀’, 𝛀’], and this causes overlaps while summing up the copies, as is shown in the Fig 5.

Figure 5. Illustration of recovering continuous signal when the sampling rate is not enough, from [1], [4]

This overlaps yield distorted magnitude spectrum and even after cutting out, the original magnitude f is lost, and therefore, the reconstructed signal is inaccurate introducing artifacts.

We can utilize what we’ve observed in above examples for downscaling as described below:

In order to prevent such distortion (artifacts) and successfully downscale an image, we have to do either:

  • (1) set sampling rate 𝛀 same as 𝛀’ (which is not practical for the purpose of downscaling), or
  • (2) remove frequencies in range [𝛀, 𝛀’] (even though we lose some image-structure information corresponding to the removed frequencies).

2. Multi-Scale Representation

Here, especially in the multi-scale representation such as the Gaussian Pyramid, while downscaling, the second option is chosen: removing frequencies in range [𝛀, 𝛀’], or equivalently, removing frequencies higher than 𝛀).

The way to remove higher frequencies is managed by what we already have learned: Convolving the image by Gaussian Filter! Setting the standard deviation 𝛔 = 𝛀 and performing the convolution, the frequencies higher than 𝛀 will go to zero, as is illustrated below.

Blue: high-frequency components introduced by noise

If this sounds unclear, please read the previous article regarding the Fourier Transform of convolved signal.

The described process is called smoothing which shows a great effect in reducing the maximum frequency of image features, preventing aliasing effect from happening. In other words, smoothing removes the fast changes among pixel intensities that sub-sampling might miss.

2.1 Gaussian Pyramid

Gaussian Pyramid is the representative method to build a multi-scale representation of an image. It consists of two steps: (1)smoothing to remove high-frequency components that could result aliasing, and (2)down-sampling: reduce the image size by half. The complete picture of Gaussian Pyramid is illustrated in Fig 6.

Figure 6. Illustration of building the Gaussian Pyramid, from [1], [6]

Smoothing, which is also called blurring, is done by convolving the image with Gaussian filter as is described before. The second step, down-sampling, means simply taking every second pixel from image in the last step and reconstruct the down-scaled version of the image.

Figure 7. Example of down-sampling. Note that here is no smoothing included, from [7]

In each level of representation, of course we lose image-structure information corresponding to the high-frequency components, and it decreases the image quality as is shown below. However, by doing so, we can downscale the image in each level without introducing artifacts.

Figure 8. Illustration of downscaled image in each level of Gaussian Pyramid, from [1], [6]

As the last remark, I include an image that shows visual comparison when smoothing is applied before resampling and when it is not. As we can all see, the high frequencies cannot be recovered but artifacts do not appear in resampled image after smoothing.

Figure 9. Visual comparison to show an effect of smoothing, from [1], [4]

2.2 Laplacian Pyramid

What do we get if we compute the difference between original image and smoothed image? We will obtain the image structures from lost high-frequency components. Computing and storing the difference in each level is called Laplacian Pyramid. Additional benefit of Laplacian Pyramid, apart from the re-gain of lost frequencies, is that smoothed images in Gaussian Pyramid no longer need to be stored as long as Laplacian Pyramid is stored. As is shown in the Fig 10, Higher resolution image in Gaussian Pyramid can be recovered by Laplacian Pyramid with G_{n}. Therefore, only need to store Laplacian Pyramid.

Figure 10. Laplacian Pyramid from Gaussian Pyramid, from [1] and [6]

please find the detail to form a Laplacian Pyramid regarding expand function in here.

3. Reference

[1] RWTH Aachen, computer vision group

[2] CS 376: Computer Vision of Prof. Kristen Grauman

[3] S. Seitz
[4] Forsyth & Ponce

[5] Prof. Svetlana Lazebnik

[6] Prof M. Irani and R. Basri

[7] Computer vision lecture of University of Toronto

Any corrections, suggestions, and comments are welcome

--

--