Phase Based Motion Magnification: Part 1

How imperceptible motions are detected and amplified

Isaac Berrios
12 min readJan 22, 2024

When I first came across motion amplification I was astonished, how can something moving less than a single pixel even have it’s motion amplified? It’s true, we can actually detect and amplify imperceptible motions in static videos, and we can do it without amplifying the noise! This concept has many applications and is already in use across multiple industries. For example, structures and equipment are not always easily fitted with sensors. A reliable visual technique can be an effective approach to quickly identify and diagnose issues, this provides a low cost structural analysis.

A peak at the Results

The GIFs below show a few examples of Phase Based Motion Magnification, the original video is on the left and the Motion Magnified one is on the right. In each GIF, the processing was applied only to the luma channel in the YIQ color space.

Figure 1. Motion Magnified crane video with an magnification factor of 25. Source: Author.
Figure 2. Motion Magnified guitar string with magnification factor of 25. Source: Author.

This is not the latest AI, the original paper is from 2013. In this post we will unpack the main concepts from the paper and learn the details behind Phase Based Motion Magnification. In part 2, we will implement it from scratch in Python, and learn how to make videos and GIFs like the ones shown above.

Photo by Brian McMahon` on Unsplash

Overview

Background

Most of the background involves understanding Complex Steerable Pyramids, and optionally Sub-Octave Complex Steerable Pyramids. For brevity, we will only summarize them here, please refer to the linked posts for more details.

Steerable Pyramid

The Steerable Pyramid is a linear multi-scale and multi-orientation image decomposition that enables us to observe the image at difference frequency bands and orientations. Each filter occupies a continuous region of the frequency domain and has an impulse response that is localized in the spatial domain. In practice we generate a bank of frequency domain filters at different bands and orientations along with non-oriented High and Low pass components, and then the decomposition is performed in frequency space. The transformation does not have aliasing and is overcomplete meaning that it contains more than enough information to reconstruct the original image.

The Complex Steerable Pyramid adds the notion of Local Phase at multiple scales and orientations. In general, phase tells us where things are located in an image and local phase is just the notion of phase at a certain location e.g. phase in the bottom left of an image. If we see phase changing across a video sequence, this means that things in the image are moving. Local Phase across a video sequence describes how the image is changing at a given location, and this is precisely what we will use to amplify motion. The key is to know how to use it which we will cover later on in this post.

Complex Steerable Pyramids provide Local Phase information which describes how the video frames are changing at a given location

Sub-Octave Complex Steerable Pyramid

By default Steerable Pyramids have frequency filters that are a single octave apart. Since Frequency domain support is inversely proportional to Spatial domain support, we can increase the Spatial domain support of our filters by making them smaller in the Frequency domain. That’s really all there is to Sub-Octave filters. For example, if we have half-octave filters, then it would take two of them to fit in the place of a single octave filter. We will see later that Sub-Octave filters are crucial for large motion amplifications.

We are going to be talking a lot about Complex Steerable Pyramids in this post, from now on we will refer to them in shorthand manner as Pyramids.

Basic Approach

Before we go all out and try to do this with a video, let’s first think about things in 1D. Through this post, concepts will be explained in 1D with notes that relate it to a practical approach in 2D.

Problem Statement in 1D

Consider a spatial signal f(x) that can be shifted by some time dependent displacement function δ(t), the shifted signal is: f(x + δ(t)). We would like to magnify the shift by a factor of α such that the magnified shifted signal will be: f(x + (1 + α)δ(t)).

Approach in 1D

We can use the Fourier Series Decomposition to express the signal in terms of Complex Sinusoids. This decomposition into Complex Sinusoids is a 1D analog of the Pyramid Decomposition that we use for videos.

Fourier Decomposition. Modified from Source.

The phase ω(x + δ(t)) = ωx + ωδ(t) of each sinusoid contains motion information, where ω corresponds to a single frequency band. The first thing we need to do is isolate the phase information that pertains to the motion. The DC component: ωx represents the underlying spatial signal and corresponds to the static background. If we remove it, we are left with the ωδ(t) component which contains the motion information that we wish to magnify. We denote this as the bandpassed phase B.

We can then scale B by α and obtain magnified motion in the frequency domain to a get motion magnified complex sinusoid.

We do this for every sub-band ω, in order to obtain the full motion magnified signal in the spatial domain: f(x + (1 + α)δ(t)).

Notes on Practical Implementation

In practice we can use the local phase of a reference frame to estimate the value of the DC component ωx so that we can extract the motion information. We subtract the reference phase from the phase at the current frame to obtain the motion signal ωδ(t). Since we assume that the video is mostly static, this approach tends to work well.

Aside from removing the DC component, we temporally bandpass filter the phase deltas to isolate a certain temporal frequency band of interest. This has a powerful influence on how motion magnification can be used since we are able to visualize any frequency band that we choose. For example, we can either visualize known frequencies of interest or we can scan across many bands to discover hidden motions.

In video frames, motion is localized meaning that objects at one location of the frame are moving in a different manner relative to other objects in other locations. For this reason δ(t) is actually δ(x, t), where the spatial variable x allows us to differentiate the motion at time t based on spatial location.

Limitations on Magnification

It turns out that there is a limit to how much we can magnify the motion. Aside from looking ridiculous, extremely large motion magnifications will not represent the true signal. In other words, the wrong features will be magnified! This limit comes from two aspects of the Pyramid filters:

  • Spatial Support of Filters → Spatial Coverage
  • Spatial Frequency Band of Filters → Center Frequency

Let’s look at two 1D examples to find out why these limit the magnification.

Impact of Spatial Support

The spatial support of the Pyramid filters has some roll-off, and we can approximate it in 1D with a Gaussian Window. In this example, consider a function f comprised of a Complex Sinusoid and a Gaussian Window.

The arguments of f are: frequency ω, position vector x, and standard deviation σ. We can shift this function by a factor α and as we have previously seen, this phase shift corresponds to motion magnification. Due to the windowing, we can only shift it within the bounds of the Gaussian window. Let’s implement this in Python to get a better understanding, the notebook for this is on GitHub.

S = lambda x, omega : np.exp(1j*omega*x) 
G = lambda x, sigma : np.exp((-x**2)/(2*sigma**2))

x = np.arange(-15, 15, 0.01)
octaves = 1 # number of octaves for Gaussian window
sigma = octaves*np.pi # std dev of Gaussian window
omega = np.pi/4 # frequency
alpha = np.pi/2 # initial phase shift factor

signal = S(x, omega) * G(x, sigma)
shifted = S(x, omega) * G(x, sigma) * np.exp(1j*omega*alpha)

The GIF below shows the phase shifting of the real and imaginary portions of the Complex Sinusoid.

Figure 3. Phase Shift example with Complex Sinusoid. Source: Author.

Once we shift the phase by a certain amount, the amplitude will be highly attenuated due to the Gaussian windowing. If we can increase the size of the Gaussian window (increase it’s spatial support), then we can extend this limit and implement larger shifts without as much attenuation.

Back in 2D, this means that we can only amplify motion by a certain amount of pixels before we degrade the signal beyond recognition due to attenuation. This limit is based is based on the Spatial Support of our Pyramid Filters. If we were to increase the Spatial Support, then we can perform more intense magnifications and still accurately approximate the true shifted signal.

Impact of Frequency Band

When we apply the motion magnification, we apply it to a single Pyramid Filter at time. The amount of useful motion magnification that we can apply is directly related to the frequency band of the filter. Let’s illustrate this in 1D by comparing the effect of phase shifting at two different frequencies. (From now on we will only look at the real portion of the sinusoid).

omega1 = 0.2 # frequency 1
omega2 = 1.0 # frequency 2

signal_1 = S(x, omega1) * G(x, sigma)
signal_2 = S(x, omega2) * G(x, sigma)

alpha_vals = np.arange(0, 2*np.pi, 0.025)

for i, alpha_val in enumerate(alpha_vals):
f_shifted_1 = signal_1 * np.exp(1j*omega1*alpha_val)
f_shifted_2 = signal_2 * np.exp(1j*omega2*alpha_val)
Figure 4. Impact of phase shift for different frequency bands within a full octave Gaussian Window. Only the real portion of the Complex Sinusoid is shown. Source: Author.

Both sinusoids are shifted by the same amount, but notice the larger amounts of attenuation of the right side where the frequency is higher. Even worse, if we keep going we will get a roll-over (due to periodicity) so the dominant feature wont even be the one we intended to magnify.

Large magnifications have more attenuation at higher spatial frequencies

Deriving a Bound for Magnification

You might be asking yourself, how much can we magnify the motion before we degrade the signal? I’m glad you asked, we can actually derive a theoretical bound for magnification based on the Gaussian Window approximation. We simply take the motion bound to be one standard deviation of the Gaussian Window, so that we maintain at least ~61% of the original signal.

Well that doesn’t seem very insightful, all we did was make an an arbitrary cut-off based on standard deviation. The real utility of the bound is to see how certain factors such as frequency and octave bandwidth affect the motion magnification and signal degradation relationship.

Incorporating Frequency into the Motion Bound

In the full octave bandwidth filter (which the Complex Steerable Pyramid provides), we have about one period of a sinusoid under the window. If we assume the full width of the window to be approximately 4σ, then we get:

Where ω corresponds to the frequency band of the filter. This provides a bound in terms of frequency ω or wavelength λ:

This bound summarizes the relationship between signal frequency and useful magnification i.e. magnification without extreme attenuation. It tells us that magnified signals at lower frequencies (larger wavelengths) experience less attenuation than higher frequency signals. Now let’s see how the Octave Bandwidth impacts the magnification bound.

Going Sub-Octave to Increase Magnification

As stated earlier, increasing the spatial support of the filters will allow us to increase the amount of magnification with less signal degradation. Going Sub-Octave means that we are decreasing the frequency support of the filters (making them smaller in frequency space), in turn this increases the spatial support of each filter. For example, going half-octave means that two half octave filters take the place of a single full octave filter in frequency space. In 1D a Half Octave Gaussian Window can support two periods of a sinusoid and we can modify our bound accordingly.

Going further, a Quarter Octave Bandwidth Gaussian Window can support four periods of a sinusoid. We can generalize the bound in terms of Octave Bandwidth, where the bound on magnification becomes larger Octave Bandwidth decreases.

Here’s a GIF that shows the impact of going Sub-Octave.

Figure 5. Sinusoids Shifted under a Gaussian Windows of different Octave Bandwidths. Source: Author.

We can also visualize shifts with localized functions windowed by a Gaussian.

Figure 6. Localized Functions Shifted under a Gaussian Windows of different Octave Bandwidths. Source: Author.

The GIF above shows that the shifted signal in the Full Octave Gaussian Window is long attenuated to 0 before the shifted signal under Quarter-Octave window even reaches an attenuation 0.60.

Handling Noise

This phase based approach to motion magnification does not actually amplify the noise, rather it translates it with the shifted motion. Consider an image I with added noise: I + σₙn, where the noise power σₙ is much less than the image signal I. It’s response at a Pyramid filter of frequency ω is denoted as:

Where N is the response of the noise to the filter Since we assume that the noise is much smaller than the signal, so we also assume that phase corresponding to motion is approximately equal to ωδ(t) just like we saw earlier. The magnified motion response is:

The noise term in the motion magnified filter response now has a phase shift αωδ(t) that corresponds to a translation of the noise and not amplification. This ties back to the basics of this phase based method. We aren’t directly amplifying the signal, we are amplifying the motion by phase shifting the features of interest. It just so happens that we also amplify the motion of the noise (move the noise from one spot to another), but we don’t actually increase the noise power.

Filtering the Phase Deltas

Even though the assumptions imply that the filtered phase will be equal to ωδ(t), we still need to realize that the noise can corrupt the phase signal and cause incorrect motions to be amplified. To account for this we can Filter the acquired phase differences. We do this by applying a spatial Gaussian Blur to the phase differences at each frame, with an amplitude weighted filter.

Where φ is the phase signal at Pyramid Filter i and Video Frame k, A is the amplitude the current filter response, and K is a Gaussian Kernel with standard deviation ρ.

Summary

An overview of Phase Based Motion Magnification is shown below:

Figure 7. Phase Based Motion Magnification Overview. Source.

It’s important to note that we perform the processing on a single Pyramid filter at a time, and we typically perform the processing across all frames of the video. However, the processing is not performed on the High and Low Pass Pyramid components, they are left unchanged and reincorporated at the final step.

  • 🇦 → Decompose the image with a Complex Steerable Pyramid.
  • 🇧 → Remove the DC phase component from each frame to get the motion signal ωδ(t) in terms of phase. Then temporally filter the motion/phase signals across the video frames to the bandwidth of interest.
  • 🇨 → Optionally perform Amplitude weighted filtering to denoise the phase signals.
  • 🇩 → Apply the motion magnification factor αωδ(t) at each filter.
  • 🇪 → Reincorporate the High and Low Pass components and collapse the Pyramid and get the motion magnified video frame

Conclusion

We have covered the basics of Phase based Motion Magnification with 1D examples. We have described the basic approach in 2D along with it’s limitations which depend on spatial support and frequency band. We have shown how the noise is not actually amplified, it is just translated which is a convenient feature of this approach. A high level overview of the algorithm is shown above, and part 2 shows how to actually implement it.

Thanks for Reading! If you found this useful please consider clapping 👏

References

[1] Simoncelli, E. P., & Freeman, W. T. (n.d.). The steerable pyramid: A flexible architecture for multi-scale derivative computation. Proceedings., International Conference on Image Processing. https://doi.org/10.1109/icip.1995.537667

[2] Portilla, Javier & Simoncelli, Eero. (2000). A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients. International Journal of Computer Vision. 40. 10.1023/A:1026553619983.

[3] Granlund, G. H., & Knutsson, H. (2011). Signal Processing for Computer Vision. Springer.

[4] Wadhwa , N., Rubinstein , M., Durand, F., & Freeman , W. T. (2013). Phase-Based Video Motion Processing. Phase-based video motion processing. https://people.csail.mit.edu/nwadhwa/phase-video/

[5] https://rxian.github.io/phase-video/

--

--