Steerable Pyramids

An Introduction to Steerable Pyramid Decompositions

Isaac Berrios
10 min readJan 1, 2024

Pyramid Decompositions are common in Computer Vision, but there is one type of Pyramid that is not often discussed and that is the Steerable Pyramid. This post dives into the basics of Steerable Pyramids and provides visualizations to help provide a deeper understanding. The main idea to is to see how the Decomposition is performed and learn about it’s properties along the way. Some code will be shown here to give a general idea of what’s going on, the full code for this tutorial can be found on GitHub. However, I recommend using pyrtools for any serious applications.

We assume familiarity with Image Pyramids and the concept of spatial frequency, please review the linked sources if you think you need a refresher. Here’s an overview:

Photo by Julian Hanslmaier on Unsplash

Background

An Image Pyramid is a multi-scale decomposition of an image, meaning that the image is scaled down multiple times to smaller and smaller sizes. The main advantage of decomposing an image into multiple scales is that information that is not apparent at one scale, might be more apparent at another scale. A common example is the Gaussian Pyramid, where an image is repeatedly Gaussian Blurred and sub-sampled.

Figure 1. Example of a Gaussian Pyramid with a depth of 5 levels. Source.

Another is the Laplacian Pyramid, where the residuals (differences) between the subsampled image and it’s Gaussian Blurred counterpart comprise it’s levels. These residuals h₀ and h₁ in figure 2 are actually band pass filtered images, and as we go lower in scale we go lower in frequency until we get to the final Low Pass component denoted by f₂. The Laplacian Pyramid is overcomplete by a factor of 4/3 (1 + 1/4 + 1/16 + … ≈ 4/3).

A note on overcompleteness: Complete means that the Decomposition (or transform) encodes all the image information and is invertible. Overcomplete means complete and oversampled, which means that the the Decomposition contains more than enough information to reconstruct the original image. We can also say that the dimensionality of an overcomplete Decomposition is higher than the dimensionality of the input

Figure 2. Example of a Laplacian Pyramid.

A fun use case for Image Pyramids is image blending where different spatial scales enable a smooth transition from one image to the other. Another is creating a cartoon of an image, where the smaller image scales provide colors that best represent the image as a whole. An honorable mention would be the ancient technique of multi-scale template matching for object detection.

Steerable Pyramids

Steerable Pyramids add the notion of orientation to the Pyramid Decomposition. On each sub-scale we now have multiple orientations, and as with the Laplacian Pyramid each spatial scale corresponds to a different frequency band. As we go lower in scale we go lower in frequency until we get to the final Low Pass component. Gaussian and Laplacian pyramids have levels which describe scale, but Steerable Pyramids have sub-bands which describe both scale (frequency) and orientation. A Steerable Pyramid is a collection of filters at different sub-bands which also include separate Low and High Pass components. The Low and High Pass components are not selectively steered to an orientation, they encompass a full 360°.

Due to the notion of frequency sub-bands, the Steerable Pyramid is best described in the Frequency Domain. Now let’s dive in and see how to compute one of these decompositions and we’ll discuss some important properties along the way. Here’s the test image that we will use, the code is located on GitHub.

Figure 3. Test Image. Source: Author.

A Steerable Pyramid is a collection of filters at different sub-bands which include separate Low and High Pass components. The Steerable Pyramid Decomposition is the collection of filter outputs.

Performing the Decomposition

Getting the first High and Low Pass Filters

The Decomposition is performed in Frequency Space, where the image is initially split into High and Low Pass components denoted H₀ and L₀ respectively. We compute the Discrete Fourier Transform (DFT) of the image and multiply by Frequency Domain Filters, this is shown below in figure 4.

Figure 4. Initial High Pass and Low Pass Filters and their outputs on the Test Image. Source: Author.

Now let’s see how we actually obtain the High and Low Pass Filters, we first map the image height and width into a Polar Grid in the Frequency Domain. The radial portion of the Polar Grid ω corresponds to spatial scale (frequency band) and the angular portion θ corresponds to orientation.

Defining the Polar Grid based on image dimensions bandlimits the filters and prevents aliasing, in other words frequencies larger than the Nyquist Frequency will be equal to zero (or in this case not even contained in the Polar Grid!).

def get_polar_grid(h, w):
""" Obtains Angle and Radius of Polar grid
Inputs:
h, w - input image height and width
Outputs:
angle - Angluar Component of Polar Grid
radius - Radial component of Polar Grid
"""
h2 = h//2
w2 = w//2

# Get normalized frequencies (same as fftfreq) [-1, 1)
# modulus remainders to account for odd numbers
wx, wy = np.meshgrid(np.arange(-w2, w2 + (w % 2))/w2,
np.arange(-h2, h2 + (h % 2))/h2)

# angular component
angle = np.arctan2(wy, wx)

# radial component
radius = np.sqrt(wx**2 + wy**2)
radius[h2][w2] = radius[h2][w2 - 1] # remove zero component

return angle, radius

The code snippet below shows how we can obtain the initial High and Low Pass Filters (HPFs and LPFs), where radial_val defines the radial boundary of the filter and ranges from (0–1). A lower value of radial_val provides a HPF with a larger pass band and an LPF with a smaller pass band. The width of the transition region between the High and Low pass filters is specified by twidth, where values closer to zero provide a very abrupt transition, shown in figure 6.

Since we are working in frequency space, we will refer to the filters as masks

radial_val = 1.0 # specifies filter boundary 
twidth = 1.0 # width of transition region

# shift log radius (shifts by an octave if log2(radial_val) = 1)
log_rad = np.log2(radius) - np.log2(radial_val)

hi_mask = np.clip(log_rad, -twidth, 0)
hi_mask = np.abs(np.cos(hi_mask*np.pi/(2*twidth)))
lo_mask = np.sqrt(1.0 - hi_mask**2)
Figure 5. Effect of rval on the High and Low Pass Filters. Source: Author.
Figure 6. Effect of the Transition Width (twidth) on the High Pass Filter. This has the same general effect on all filters. Source: Author.

Notice how the HPF cross-section in figure 6 is flat (the LPF is also flat). Since the filters equally accept the wanted frequencies in their pass bands, we are able to get a more clean reconstruction of the image. This makes the flat response is an important property of a the Steerable Pyramid.

Back to the Big Picture

So far we have covered the first part of the decomposition where we split the image into High and Low Pass components, the image in figure 5 gives a hint at what’s to come next. The initial Low Pass component L₀ is recursively split into sub bands at multiple orientations. An overview of the complete decomposition is shown below, where B represents the sub-bands and it’s subscripts indicate the different orientations. In the next section, we will dive into the “recursive” operations. (even though the decomposition is recursive, we can implement it with a loop).

Figure 7. Diagram of a Steerable Pyramid Decomposition. The left is the decomposition and right is the reconstruction. Source.

Getting the Sub-Band Filters

The sub-band filters of a Steerable Pyramid are required to be polar separable which means that we can separate them in radius and angle. The expression below describes a Polar separable Filter where A is the angular component with respect to angle θ, θᵢ is the steering/orientation angle, and B is the radial component which depends on frequency band ω. B is the radial sub-band component (determines frequency band, see right side of figure 8), while A selects the orientation (see figure 9). Separability also implies that we can multiply them together to get the filter.

We will first define the radius of the sub-band and then we will select the orientation angles. Now is a good time to introduce the two main hyperparameters that will control the total number of filters in the decomposition.

  • Depth/Height — Determines the number of radial sub-bands corresponds to the Radial component
  • Orientations — Determines the number of Orientation Angles that we steer to corresponds to the angular component

As you can see we aim to take full advantage of the Polar Separability. We can use the depth to compute a vector of radial values that determine the bandwidth of each sub-band. We have already seen how the radial values are used to compute the High and Low Pass Filters, now let’s see how we can recursively compute the radial component of the sub-band filters

depth = 4
radial_vals = 2.0**np.arange(-depth, 1, 1)[::-1]

# array([1. , 0.5 , 0.25 , 0.125 , 0.0625])

As we iterate through the radial values, we can get the first radial filter by multiplying the previous Low Pass Mask with the current High Pass Mask. In figure 8 below, we have multiplied the HPF (rval=0.5) with LPF (rval=1.0). When performing recursion, we just refer to this as the radial mask, which is the product of the current HPF and previous LPF.

rad_masks = []
for i in range(1, depth):
lo_mask_prev = lo_masks[i - 1]
hi_mask = hi_masks[i]
rad_mask = hi_mask * lo_mask_prev

rad_masks.append(rad_mask)
Figure 8. Construction of a Sub-band filter with no-orientation. Source: Author.

The next step is to obtain the Steering/Orientation Angle masks, which steer the sub-band filters to their appropriate orientations.

order = orientations - 1
const = np.power(2, (2*order)) \
* np.power(factorial(order), 2) \
/ (orientations*factorial(2*order))
angle = np.mod(np.pi + angle - np.pi*b/orientations, 2*np.pi) - np.pi
angle_mask = np.abs(2*np.sqrt(const) * np.power(np.cos(angle), order))

Here’s an example for orientations = 4:

Figure 9. Orientation Masks (Steered Angles). Source: Author.

And finally we multiply the current Radial mask with each of the orientation mask to get the current set of Steered Sub-Band Filters.

## 

# get oriented filters for current Radial Mask
for angle_mask in angle_masks:
filt = rad_mask*angle_mask/2

##
Figure 10. First level of Steerable Pyramid Filters with 4 orientations. Source: Author.

The Filters of the Decomposition actually partition the frequency space of an image (with some overlap). Figure 11 below shows an example of how this partition occurs, the shaded region is an example of a single filter like the ones shown above in figure 10.

Figure 11. Ideal partition of the image’s frequency space. Source.

Each of the regions represents a different filter that captures a certain frequency band and orientation of the image.

Now we can display the Pyramid Decomposition for all the sub-bands (except the High and Low Pass Components which are shown in figure 13). We multiply each sub-band filter by the DFT of the original image to obtain the sub-band components of the Pyramid Decomposition in the Frequency Domain. To get the Spatial Domain responses, we take the real part of the inverse DFT. The Spatial Domain responses of the sub-band filters in figure 10 are on the top row of the decomposition below in figure 12. In this case, they are barely visible most likely because the image does not contain much frequency information in this band. In the notebook, there is a large plot that actually reveals some detail, it’s just hard to see in this post, we can also take the magnitude as we did in figure 4.

Figure 12. Steerable Pyramid Sub-band Decomposition, depth=4, orientations=4, twidth=1.0. (Filters cropped for visualization). Source: Author

Now we can display the final Hi and Low Pass Components, where we have again taken the Magnitude of the High Pass component to show the detail.

Figure 13. High Pass and Low Pass Filters and their outputs on the Test Image. Source: Author.

Reconstruction

Reconstructing the original image from the decomposition is fairly straight forward. All we need is the Full Pyramid Decomposition and the Frequency Domain Filters. We first compute the DFT of each Pyramid Decomposition and then multiply by it’s accompanying filter. We accumulate these products into a complex type matrix and then take the real portion of it’s Inverse DFT to get the reconstructed image.

recon_dft = np.zeros((h, w), dtype=np.complex128)
for i, (pyr, filt) in enumerate(zip(pyramid, filters)):
# dft of sub band
dft = np.fft.fftshift(np.fft.fft2(pyr))

# accumulate reconstructed sub bands
recon_dft += dft*filt

# get reconstructed image
recon = np.fft.ifft2(np.fft.ifftshift(recon_dft)).real
Figure 14. Original Image and it’s reconstruction from a Steerable Pyramid Decomposition. Source: Author.

Just to be sure, let’s check the numerical differences:

np.sum(np.abs(recon - image)), np.mean(np.square(recon - image))

# (6.212063769519763e-09, 1.053688416546883e-27)

Compared with pyrtools, the decomposition is very similar and the reconstruction is still accurate.

Conclusion

Steerable Pyramids are banks of Steerable Bandpass Filters that perform a linear multi-scale decomposition of an image at different combinations of scales (frequencies) and orientations. The Decomposition has no aliasing (due to bandlimiting) and is overcomplete by a factor for 4k/3 where k is the number of orientations, overcomplete means that the transform contains more information than necessary to reconstruct the image. A drawback of the overcompleteness is that the Decomposition is not computionally efficient.

References

[1] Simoncelli, E. P., & Freeman, W. T. (n.d.). The steerable pyramid: A flexible architecture for multi-scale derivative computation. Proceedings., International Conference on Image Processing. https://doi.org/10.1109/icip.1995.537667

[2] Freeman, W. T. (n.d.). Image Pyramids — Massachusetts Institute of Technology. MIT Computer Vision. http://6.869.csail.mit.edu/fa19/lectures/notes_lecture_7.pdf

[3] Jepson, A. (n.d.). Pyramids — Department of Computer Science, University of Toronto. http://www.cs.toronto.edu. https://www.cs.toronto.edu/~jepson/csc320/notes/pyramids.pdf

[4] Simoncelli, E. P., Freeman, W. T., Adelson, E. H., & Heeger, D. J. (1992). Shiftable multiscale transforms. IEEE Transactions on Information Theory, 38(2), 587–607. https://doi.org/10.1109/18.119725

[5] Portilla, Javier & Simoncelli, Eero. (2000). A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. International Journal of Computer Vision. 40. 10.1023/A:1026553619983.

→ Python code was based on the Phase Based Motion Magnification Pyramid implementation: http://people.csail.mit.edu/nwadhwa/phase-video/

Thanks for Reading to the end! If you found this post useful please consider clapping 👏 It really helps others who might also find it useful

--

--

Isaac Berrios

Electrical Engineer interested in Sensor Fusion, Computer Vision, and Autonomy