Deconstructing the Homography Matrix

Insight in Plain Sight
4 min readMar 29, 2022

--

The homography is a core concept in computer vision and multiple view geometry. It describes the mapping between two images that observe the same plane.

A homography relates a plane seen from two cameras by a linear mapping

Using homogeneous coordinates one can describe this mapping by a 3x3 matrix:

Homography matrix for projective 2-space

It might be intimidating to interpret the effects of a matrix with 9 parameters and 8 degrees of freedom at first. But we can decompose the matrix into separate parts, each with less DoF and easier to understand.

The Transformation Hierarchy

Therefore, we have to understand the hierarchy of transformations first. Each is more powerful than the former:

  • Euclidean
  • Similarity
  • Affine
  • Projective

We will cover each transform and step by step uncover the homography matrix.

Euclidean Transform

Euclidean geometry is very natural to us. It basically describes rigid object movement. A euclidean transform consists of a rotation and translation. It has 3 DoF.

Example of euclidean transform, in this case pure rotation

Similarity Transform

Almost the same as Euclidean, except we scale the space in addition. It has 4 DoF. In photogrammetry, we can usually reconstruct a scene up to a similarity transform. This is the case when we have only images and no metric measurements in the outside world.

Example of similarity transform with increasing scale

Affine Transform

The affine transformation is already more general. It includes a similarity transform. But it can also stretch and shear the space. We can also describe a transformation by its invariants, by asking the question: “What does not change?”. For euclidean transform angles and sizes do not change. For similarity transform, relative angles do not change. And for affine transform parallel lines stay parallel. The affine transform has 6 DoF.

Example of affine transform with increasing skew factors

Projective Transform

Finally, we arrived at the projective transform. The projective transform is special, because the other transforms can be described in euclidean space. A projective transform only makes sense in projective space. So, what power does this transform hold? The most unique ability of a projective transform is to warp points at infinity. For example, parallel lines will intersect at a finite point after transformation. This especially happens if you observe an image from an other view point.

Example of projective transform with only projective elements (affine part is identity matrix)

The Chain Decomposition

One can go one step further and isolate the specific effects of the different transforms. Remember, we are dealing with a transformation hierarchy. It means that the affine transformation includes all similarity transforms. And a projective transform includes all other transforms. In fact, every general projective transform can be decomposed into three parts:

Chain decomposition of a general homography matrix

We can see that what distinguishes a projective transform from affine is a matrix multiplication with two DoF in the v vector. With these additional two parameters we gain the “ability” to affect points at infinity. Remember that vectors with the last entry equals to 0 are ideal points, that means intersections of two parallel lines at infinity.

The affine transformation cannot modify ideal points:

z-coordinate remains zero

The isolated projective part does just that:

Ideal point is transformed into finite point
Right: Pencils are parallel, Left: Lines intersect slightly outside of the image

Similarly, the affine decomposition does not contain a translation vector t, since it is already covered by the similarity transform. One also observes, that similarity and euclidean transform are represented by one matrix.

Summary

We saw that a homography follows a hierarchy of transformation. From euclidean to similarity, affine and projective, each transformation adds a bit of functionality. Instead of understanding the matrix H as a whole, we can decompose H into a chain of transformations and isolate the specific effects.

Code

You can find the code for the transformations here:
github.com/hq-jiang/deconstruction-of-homography-matrix

Literature

Multiple view geometry in computer vision, R. Hartley, A, Zisserman

--

--