Pixel-to-Pixel Image Alignment and 3D Reconstruction

Barney Gordon
7 min readJul 23, 2020

--

Phase correlation is an alternative method to using GCPs for aligning image data, it also provides sub-pixel accuracy and the ability to generate elevation models by measuring sub-pixel disparity.

Aligning satellite imagery can often be a painful process. Simple image processing operations like comparing images from different dates or overlaying imagery with vector data, require a precise spatial alignment. Even Sentinel-2 can have alignment issues between images taken on different dates.

Misalignment is usually the product of poor georeferencing. When an image is taken from a satellite (or airplane or drone) the position of the image pixels on the ground is calculated as a function of GNSS measurements onboard the sensor combined with the attitude of the sensor relative to the ground. Given that onboard measurements are often imperfect, a slight unrecorded change in the angle a sensor is pointing can result in an image who’s location is 10s of meters off from where we’d expect.

Most of the time we try and solve this alignment problem by estimating the misalignment between a reference and input image, and then restore the input to the reference image by warping it using a transformation. This is usually done using Ground Control Points (GCPs) in a feature-based approach. The simple explanation for the feature-based method involves picking out unique GPCs in both images, assigning each point with a descriptor and then using the descriptors to group the points into cross-image pairs. Once pairs have been identified the misalignment is calculated as a homography (here is a more detailed explanation of how GCPs can be automatically chosen). The homography describes the misalignment between the input and reference images as a global transformation, often as a linear or affine shift.

There is an inherent issue with trying to align images using a global transformation, which is that it is often the case that images aren’t misaligned consistently across all pixels and that in reality there is local deformation that varies throughout the image pair.

Phase Correlation (PC)

This brings us onto phase correlation (PC), which makes use of the Fourier shift property where any misalignment between two images is visible in the frequency domain as a difference in phase. PC entirely differs from feature-based methods since it relies on areas of pixels instead of GCPs.

As a simple step-by-step demonstration of PC I’ve taken a Sentinel-2 image, rotated it by 40 degrees, shifted it 20 pixels to the right and finally 50 pixels down. We should be able to use PC to accurately estimate these values of shift and rotation.

Left: Reference Sentinel-2 image; Right: The same image but rotated 40 degrees and shifted by x: 20 y: 50

The steps involved in solving the image shifts are as follows:

  1. Input image pair: these are the reference and input images where the input is misaligned from the reference. Often a 2D hamming window is used to remove edge effects and reduce noise, which generally improves the PC result.
  2. Fourier transform: both images are transformed to frequency space using a Discrete Fourier transform.
  3. Phase correlation: taking the normalised cross power spectrum of both frequency images produces the PC image with a value range of (-π,+π).
  4. SVD decomposition: this step is not strictly necessary, but serves to improve the precision of the shift estimate. Singular Value Decomposition (SVD) is used to take the rank one vectors from the PC image [1].
  5. Unwrapping: since the SVD left and right vectors are 2 Pi wrapped the need to be unwrapped in order for us to estimate their gradient.

The steps above are used for estimating linear shifts; the process for calculating rotation is very similar but requires transforming the input images into polar coordinates before applying PC.

Diagram of the steps involved in phase correlation (moving from left to right)

The values for rotation, x and y shift are finally calculated from the gradient of the unwrapped PC image. In this case the output is: 39.9933 degrees rotation, 19.9995 pixels right and 49.9987 pixels down.

Animation of the rotated and shifted Sentinel-2 image being restored using PC

Of course, in this example the input image is the same as the reference image but with an artificial shift and rotation. Because of this, the PC is relatively noise-free and so the estimate is quite accurate. In real life the input and reference images may be taken at different times, from different angles or from different sensors, which often introduces noise into the PC image making unwrapping difficult and producing a poorer misalignment estimate.

Pixel-to-Pixel Alignment

At this point you may be thinking that we‘ve still not solved the issue left from using homography and GCPs. PC only describes the misalignment between two images as a global shift and rotation. In reality many images can have local deformation that’s often introduced when the images are taken using sensors with different lens geometries, or when the sensor orientation shifts during the image capture process.

To solve this issue and align images with local deformation, we simply need to estimate shifts in x and y for every pixel in the image pair. This is done by applying PC to every pixel in both images using a small scanning window (e.g. 64x64 pixels) [2].

Applying PC in this way produces two optical flow images, one that describes translation for every pixel in the horizontal axis and the other describing it vertically. To align the input image to the reference one, we simply need to transform the input using the optical flow images as a map.

In theory the transformed image should now be aligned to the reference image at a pixel-to-pixel level. In fact, since the x and y shifts are estimated directly in the frequency domain, the transformation will have sub-pixel precision.

3D Reconstruction

Image alignment is not the only use-case for the pixel-to-pixel PC. In the case of a stereo pair (i.e. two images taken of the same location but from different viewing angles in order to produce a parallax effect), the pixel-to-pixel displacement between both images should correspond directly with elevation change [3].

Below is a pair of SPOT 6 stereo images where the displacement at a pixel level is largely a function of elevation change, we can observe the parallax effect by switching between both images. The bottoms of the river beds, since they are farther from the perspective of the viewer, have a smaller displacement than those areas with a higher elevation and are closer to the sensor.

SPOT 6 pansharpened stereo pair with 1.5 m resolution (images courtesy of Airbus DS)

After calculating the displacement between both images at a pixel level we are left with the image below resembling a Digital Elevation Model (DEM), although in this case the value scale for the image is in the units of pixel offset rather than meters.

Digital Elevation Model produced using pixel-to-pixel PC (based on SPOT 6 stereo pair provided by Airbus DS)

After hillshading and colouring the DEM, the result is a fairly realistic visualisation of terrain!

Hillshaded and colourised DEM (blue = low, red = high)

In order to validate the DEM produced by the pixel-to-pixel PC method, I’ve taken a horizontal cross-section roughly halfway down the vertical extent of the DEM and compared it with an equivalent slice of SRTM data. Since the SPOT 6 DEM is in units of pixels the DEM was rescaled to match with SRTM, this was done just to give a rough idea of the relative alignment.

There is a surprisingly good agreement between the two datasets! The SPOT 6 DEM has a higher precision, but that is only to be expected considering it is derived from 1.5 m resolution SPOT 6 data compared to the 30 m SRTM. Interestingly there seems to be a growing offset moving from right to left, which I suspect may be an artefact introduced by the stereo pair possibly having a spatial misalignment that is greater on one side of the image pair compared to the other.

Horizontal cross-section of the PC generated SPOT 6 DEM and corresponding SRTM DEM

Conclusion

After considering the uses of PC in producing elevation models it becomes clear that the pixel-to-pixel PC method isn’t simply an image alignment technique, but rather it’s a way of measuring disparity between images. If we think of the technique in these terms then it opens a number of other applications for pixel-to-pixel PC, for example change detection or moving target detection where certain pixels have migrated or new ones have appeared.

References

[1] W.S Hoge (2003) A Subspace Identification Extension to the Phase Correlation Method

[2] J.G. Liu & H. Yan (2008) Phase correlation pixel-to-pixel image co-registration based on optical flow and median shift propagation

[3] G.L.K. Morgan, J.G. Liu & H. Yan (2010) Precise Subpixel Disparity Measurement From Very Narrow Baseline Stereo

Thank you to Airbus DS for permitting the use of their SPOT 6 data

--

--