Complex Steerable Pyramids
How to Decompose the Local Phase of an image
When I first learned about Phase Based Motion Magnification, I discovered that it heavily relied on Complex Steerable Pyramids. Upon first glance I was distraught at learning more because I couldn’t even understand a regular Steerable Pyramid. After some time reviewing some key papers and generous GitHub repos I was able to finally grasp the concepts of the Steerable Pyramid and finally the Complex Steerable Pyramid. I was able to learn that Steerable Pyramids provide a rich and overcomplete image decomposition, but they are missing a crucial piece of information and that is local phase.
In this post we will learn about the importance of local phase information and how to extend the Steerable Pyramid so that we can leverage phase information. We will code it from scratch and show visuals to help reinforce the concepts. This post assumes that you are familiar with Steerable Pyramids, please refer back to the previous post if you think you need a refresher.
Overview:
Background
The Steerable Pyramid allows us to represent an image at different scales and angles which reveals important information that would otherwise be hidden. On more technical note: the Steerable Pyramid Decomposition is a multi-scale decomposition comprised of sub-band filter responses which are characterized by scale (frequency) and angular orientation. It has no aliasing and is overcomplete, meaning that the decomposition contains more information than necessary to reconstruct the original image.
Motivation for a Complex Decomposition
A Complex Decomposition allows us to obtain local phase information at multiple scales and orientations. This section motivates the need for phase information in a pyramid decomposition.
Importance of Phase
If there’s one thing to know about phase, it’s that it tells us where things are located in an image. We can get phase information by taking the 2D Fourier Transform of an image which provides a complex frequency representation. The magnitude of the Fourier Transform is usually visualized because it is describes how information is distributed across different frequencies, but in reality the phase is what contains crucial information about the image. In fact we can’t reconstruct an image with magnitude alone, but we can get close if we use only phase. Here’s an example:
from skimage.data import camera
image = camera()
# get magnitude and phase of DFT
dft = np.fft.fft2(image)
magnitude = np.abs(dft)
phase = np.angle(dft)
# reconstruct seperatley with magnitude and phase
mag_recon = np.fft.ifft2(magnitude)
ang_recon = np.fft.ifft2(np.exp(1j*phase))
The image above proves that magnitude does not provide enough information to reconstruct the image, but the phase provides enough information to recover the major edges akin to a High Pass Filter. This is due to something called Phase Congruency.
Phase Congruency
The principle of Phase Congruency reveals that when phases of decomposed signals overlap, the original signal exhibits strong features. Here’s an example with a 1D square wave to clarify things.
Figure 2 shows a square wave (solid line) with it’s Fourier Decomposed signals (dashed lines). We can see in the middle of the image that when all the decomposed signal phases overlap the original signal ramps up. In the flat regions that decomposed phases are very far out of sync. On the left and right edge of figure 1, we see that the decomposed phases are again aligned as the original signal ramps downwards. Another thing to note is that there will be a 2π modulo wrap around for the phase differences of the decomposed signals i.e. the phases differences range from [0, 2π).
Properties of Local Phase
To get an even better understanding of local phase (phase at a local region), some of it’s main properties are listed below for reference.
- Describes the locations of image features such as edges and corners
- Describes how fast the frequency is changing → The derivative of phase is referred to as the local or instantaneous frequency
- Equivariant with Spatial Frequency → Phase changes faster in higher frequency areas than in low frequency areas
- Equivariant with Spatial Position → Phase generally varies smoothly and monotonically with the position of the signal except for a modulo 2π wrap around
- Typically Ranges from [-π, π] for a total of 2π → we can really scale this however we want
- Invariant to signal energy → does not matter if there are large or small signal variations, local phase will be the same
Extending to a Complex Decomposition
Conjugate Symmetry
To extend to a Complex Decomposition we take advantage of the Conjugate Symmetry of the Frequency Domain, orientations (θ, θ + π) yield redundant, complex conjugate coefficients. So we only need orientations in the range of [0-π] and not [0-2π] as with the regular Steerable Pyramid.
“The DFT component is the complex conjugate of component (and vice versa): their real parts are identical, and their imaginary parts are negatives of each-other” [3]. We can reconstruct the entire DFT series by only taking half of it knowing that we can just take the complex conjugate to get the other half. DFT components that are complex conjugates of each other have phase angles with opposite signs, so when we obtain the angular masks for the Complex Steerable Pyramid, we will only need to obtain components with positive phase angles.
Getting the Angular Masks
The snippet below shows how we can modify the angular masks for the Complex Steerable Pyramid, the full notebook is on GitHub.
order = orientations - 1
const = np.power(2, (2*order)) \
* np.power(factorial(order), 2) \
/ (orientations*factorial(2*order))
angle = np.mod(np.pi + angle - np.pi*b/orientations, 2*np.pi) - np.pi
# only get 1 lobe due to conjugate symmetry
angle_mask = 2*np.sqrt(const) \
* np.power(np.cos(angle), order) \
* (np.abs(angle) < np.pi/2)
The Complex Steerable Pyramid does not contain negative frequency components as with the regular Steerable Pyramid. And now, we can use the same approach as the regular Steerable Pyramid to generate the Frequency Domain sub-band filters.
As with the regular Steerable Pyramid, it’s Complex counterpart partitions the frequency space of an image. Except it discards half of the mid frequency sub0bands. Figure 6 shows an example, where the dark red on the edges and middle corresponds to the high and low pass components, while the black region is completely discarded due to conjugate symmetry.
And now here’s the Full Sub-Band Decomposition which is comprised of both magnitude and phase. Notice how we still obtain good representations of the image even though we only have positive frequencies.
Reconstruction
The reconstruction for the Complex Steerable Pyramid is almost the same as the regular Steerable Pyramid. We still accumulate the complex products of the filter and frequency domain decomposition, but we need to account for the fact that the sub-band filters only contain positive frequencies. It turns out that we can just apply a scale factor of 2.0 for the Complex Sub-Band filters. We don’t apply the scale factor of 2.0 for the High and Low Pass components because they do not have selective orientations like the sub-band filters do.
recon_dft = np.zeros((h, w), dtype=np.complex128)
for i, (pyr, filt) in enumerate(zip(pyramid, filters)):
# dft of sub band
dft = np.fft.fftshift(np.fft.fft2(pyr))
# accumulate reconstructed sub bands
if (i !=0 ) and (i != (len(filters) - 1)):
recon_dft += 2.0*dft*filt
else:
# High and Low Pass Filters
recon_dft += dft*filt
# get reconstructed image
recon = np.fft.ifft2(np.fft.ifftshift(recon_dft)).real
The regular Steerable Pyramid gives a nearly perfect reconstruction, but the complex steerable pyramid is literally missing information in frequency space. This really baffled me at first, but the spectrum actually contains all the information that we need.
Note: Other implementations actually take the flipped Complex Conjugate in order to reconstruct the DFT in the same manner as the regular Steerable Pyramid. In this implementation based on this, we have designed the filters such that this is not necessary.
Taking the real part of the inverse DFT on the right of figure 7 will actually provide a very close image reconstruction. The concept of Conjugate Symmetry says that the the full image can be restored with only half of the DFT components, since half of the DFT components are just complex conjugates of each other.
recon = np.fft.ifft2(np.fft.ifftshift(recon_dft)).real
np.sum(np.abs(recon - image)), np.mean(np.square(recon - image))
# (7.874821239539642e-09, 1.5815154470611668e-27)
And once again, we get a good reconstruction error. Let’s see what happens when we don’t reconstruct with the factor of 2.0, the DFT looks about the same so here’s the reconstruction results:
np.sum(np.abs(recon - image)), np.mean(np.square(recon - image))
# (8328608.668750203, 1552.7875000674765)
The results don’t look as good without the scaling factor, the frequencies magnitudes are way off and the DC center is no longer DC.
Conclusion
In this post we have learned how to extend a Steerable Pyramid to a Complex Pyramid by taking advantage of DFT Conjugate Symmetry. The Complex Steerable Pyramid is able to capture local phase information at various scales and orientations. We know that local phase contains crucial image information which allows the Complex Steerable Pyramid to be utilized in some interesting applications such as texture generation and motion amplification.
Thanks for Reading! If you found this useful please consider clapping 👏
References
[1] Simoncelli, E. P., & Freeman, W. T. (n.d.). The steerable pyramid: A flexible architecture for multi-scale derivative computation. Proceedings., International Conference on Image Processing. https://doi.org/10.1109/icip.1995.537667
[2] Kovesi, P. (2003). Phase congruency detects corners and edges — semantic scholar. https://www.semanticscholar.org/paper/Phase-Congruency-Detects-Corners-and-Edges-Kovesi/e3c6d3e9481cfc68dfa61e3534e5f7f3b7cfc9c4
[3] Granlund, G. H., & Knutsson, H. (2011). Signal Processing for Computer Vision. Springer.
[4] Mcfee, B. (n.d.). Digital Signals theory. 6.3. Conjugate Symmetry — Digital Signals Theory. https://brianmcfee.net/dstbook-site/content/ch06-dft-properties/Conjugate-Symmetry.html
[5] Portilla, Javier & Simoncelli, Eero. (2000). A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. International Journal of Computer Vision. 40. 10.1023/A:1026553619983.
[6] Wadhwa , N., Rubinstein , M., Durand, F., & Freeman , W. T. (2013). Phase-Based Video Motion Processing. Phase-based video motion processing. https://people.csail.mit.edu/nwadhwa/phase-video/
→ Python code was based on the Phase Based Motion Magnification Pyramid implementation: http://people.csail.mit.edu/nwadhwa/phase-video/