Part II: Projective Transformations in 2D

Daniel Lenton
21 min readJun 15, 2019

--

A short blog post introducing projective transformations, and the hierarchy of transformation specializations. This post will be limited to the case of 2D points and lines, with later posts generalizing to 3D

A shot from Stanley Kubrick’s “2001: A Space Odyssey”, demonstrating his extensively used one-point perspective filming technique. In this kind of shot, parallel lines in the real world converge towards the centre of the image. This mapping from real-world lines to image lines is explained by projective transformations!

Before diving in, it is worth emphasizing that everything covered in this post is derived from chapters 2.3 and 2.4 of the well known Hartley Zisserman book, in particular pages 32–44. If you would like a more formal description on any of the topics subsequently covered, this online book could therefore serve as a helpful complimentary resource. For a softer entry into the topic, though, I would strongly recommend starting with this blog post!

Right… Still here? So let’s get to it.

In the last post, we covered projective geometry in 2D, and more specifically, we dived pretty deep into what homogeneous co-ordinates are, and why they are useful for representing points and lines!

In this post, having gained a better understanding of the homogeneous vector representations behind projective geometry, we now consider the scenario of applying transformations to these vector representations.

Note, this is very different to performing conversions between vector spaces (as we covered in the previous post). The conversions between Euclidean and Homogeneous vectors from the previous post simply changed our method of representation, and entirely preserved the geometric interpretation of the vector, i.e. the position of the 2D point or 2D line.

In this post, we deal with transforming these points or lines into NEW points or lines, by applying “transformations” to them. So, diving straight into it, what exactly are projective transformations, and more importantly, why should we care?

Well, let’s first refer back to the Wikipedia definition of homogeneous co-ordinates from the previous post. The parts we should already be comfortable with have been emboldened, as was done at the end of the previous post.

In mathematics, homogeneous coordinates or projective coordinates are a system of coordinates used in projective geometry.

They have the advantage that the coordinates of points, including points at infinity, can be represented using finite coordinates.

Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts.

Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix.

Okay… so you might think, we’ve already done most of the hard work! Now we just have to learn about transforming these homogeneous co-ordinates via matrix multiplications, right…?

Right!

But don’t use that as an opportunity to breeze through and not pay attention, these transformations really are very useful, and are absolutely essential background knowledge for multiple view geometry in computer vision. As a final note, for the remainder of this post, we will interchange between using the terms projective transformation and projectivity, on account that they have identical meanings, but with the latter being more succinct.

Right, so let’s get to it!

Projectivities

Strictly speaking, projectivities can be defined with no reference to linear algebra at all, by using the language of co-ordinate-free geometric concepts.

However, as computer vision scientist, the algebraic definition is more concrete, tangible, and relevant to our relatively applied approach. So, the definition of a projectivity is as follows.

A projectivity is a linear transformation on homogeneous 3-vectors represented by a non-singular 3×3 matrix

Wow, that certainly seems simple. So a projective transformation (projectivity) is basically just the process of multiplying our homogeneous 3-vector by any non-singular 3×3 matrix?

The homogeneous scaling factors kp1 and kp2 have been pulled outside of the homogeneous column vectors to reduce clutter in the above equation. Alternatively, in block form, we can simply express the equation like so:

Great, so we’re done?

Not quite, it’s useful to think a bit deeper about what this actually means. For example, how many degrees of freedom does our 3×3 matrix have?

The immediate answer would seem to be nine, on account of there being nine separate entries in the matrix. However, consider what happens with some simple re-arrangement of the equation presented above:

Given that both of kp1 and kp2 are permitted to take ANY real value, without changing the geometric interpretation of either of the homogeneous vectors, it follows that the scalar term before our matrix kp1/kp2 is also permitted to take on any real value. It follows that we can scale our matrix by any scalar value we want, and it won’t change the interpretation of the homogeneous transformation. For example, dividing the matrix by the a33 term, we get the following expression:

All we have done is scaled our matrix by an arbitrary scalar (in this case 1/a33), so it should not change the underlying transformation. Our a33 term is entirely permitted take on any value at all, and the underlying transformation remains the same. We therefore are left with only eight unique values in the matrix, instead of nine. Re-writing, with k representing an arbitrary scaling factor, and new notations for the matrix terms based on these eight ratios, we get the following equation:

Note that the bottom right term, a33, was chosen arbitrarily. We could have pulled any term out from the matrix (provided it was not equal to zero). The important point to note is that it is the RATIOS of the nine values in our H matrix that actually matter, and there are eight independent ratios between nine values. As such, it follows that a projective transformation has eight degrees of freedom, not nine.

For the remainder of this post, the arbitrary pre-multiplying k term before transformation matrices will be omitted for brevity. But it is important to understand that ANY homogeneous transformation matrix can be scaled by any arbitrary value without changing the meaning of the transformation.

So, now we are happy that a projectivity has 8 DOF. You might be wondering, what is the easiest way to visualize what a projectivity looks like? Well, this would be easy enough for a single homogeneous co-ordinate of interest. We can simply take the initial ray in the 3-vector space, and work out what the new ray looks like, and then plot both of them together. Like so:

Similarly, we could do the same for a line:

In both of these diagrams, the small green directional arrows show the direction of the transformation.

But transformations are not merely valid for a individual 2D points or 2D lines, but rather, they describe a general transformation process that can equally be applied to any point or any line. So, do we have a nice way of visualizing the transformation itself? Rather than an arbitrarily selected point or line before and after the transformation?

Before trying to answer this question, it is useful to consider a number of simpler and more specific transformations, which form subsets of the more general 8 DOF projective transformation (projectivity) group just described.

A Hierarchy of Transformations

Right, so we have a pretty clear understanding of what a projective transformation (projectivity) is, so can we deduce some more meaningful examples? and different groups of projectivities? We certainly can!

For simplicity, we will actually start here with the simplest kinds of transformations first, and then slowly work our way back up to the general 8 DOF projective transformations that we just introduced.

Isometries

Isometries are transformations in 2D space which preserve Euclidean distance (from iso = same, metric = measure). An isometry is represented as:

where ε = ±1. If ε = 1, then the isometry is orientation preserving and is a Euclidean transformation (a composition of a translation and rotation). On the other hand, if ε = -1, then the isometry reverses orientation. An example is a transformation which performs a simple reflection.

We can write this transformation in block form as follows:

So, how many degrees of freedom does this transformation have? Ignoring the ε term (which can only take on two discrete values, so does not really constitute a DOF), we have three independent terms: θ, tx, and ty, and so three degrees of freedom.

θ is our angle of rotation in the 2D plane, and tx and ty are our x and y translations in the 2D plane.

In order to visualize this, let’s assume we have a large collection of 2D points which constitute the border of a rectangle in 2D space. Alternatively, you can think of us as having four 2D lines which represent the edges of this rectangle. Let’s also assume for now that ε = 1, and so orientation is preserved.

So, looking back to our θ, tx, and ty terms, how can we interpret these with regards to our 3 DOF transformation in 2D space? These three components of our transformation can be visualized as shown in the gif below.

So, we can imagine first rotating by θ in the plane, then translating by tx in the x direction and ty in the y direction, thus giving us our 3 DOF transformation!

As a final point, note that the use of homogeneous 3-vectors has allowed us to represent a rotation and a translation with a SINGLE matrix! This is very useful. If we were representing our co-ordinates in Euclidean R2 space, we would be limited to 2×2 matrix transformations, and we would need to represent a rotation and translation as the addition of two separate vectors:

But with homogeneous co-ordinates, this is all encapsulated in a single matrix multiplication between the 3×3 transformation matrix and the homogeneous vector representation.

Similarities

Similarities are essentially Isometries composed with an additional isotropic scaling (meaning same in all directions). A similarity is represented as:

where again ε = ±1, and the same argument regarding symmetry holds true. We can write this transformation in block form as follows:

So, how many degrees of freedom does this transformation have? Again, ignoring the ε term (which can only take on two discrete values), we have four independent terms: θ, tx, ty, and s, and so four degrees of freedom this time.

Again, θ is our angle of rotation, tx and ty are our x and y translations, and our additional term s represents the isotropic scaling factor. Again, we will assume that ε = 1 for the visualization below.

So, in order to visualize this, let’s again assume we have a collection of 2D points which constitute the border of a rectangle in 2D space. Again, you also can think of us as having four 2D lines which represent the edges of this rectangle. These four components of our transformation can be visualized as shown in the gif below.

We can imagine first scaling by s in both the x and y directions, then rotating by θ in the plane, and finally translating by tx in the x direction and ty in the y direction, thus giving us our 4 DOF transformation!

Affinities

Affinities (or affine transformations) are non-singular linear transformations followed by a translation. It has the matrix representation:

We can write this transformation in block form as follows:

An affinity has six degrees of freedom, corresponding to the six unique matrix elements. Two of these clearly come from the x y translation, as with the previous examples, but do we have a better way of visualizing the four degrees of freedom corresponding to the 2×2 non-singular A matrix?

We can in fact perform singular value decomposition (SVD), and re-write the matrix A in the following form:

You may be thinking, this looks far more complicated than the simple a11, a12, a21, a22 terms we started with! Why would we want to represent things like this? Well, each of these four constituent matrices have a very interpretable meaning. The first two and the last one are all simple rotation matrices, like we have already dealt with in isometries and similarities, and the third matrix represents simple scaling by λ1 and λ2 in the x and y directions respectively. Re-writing this in block form:

We can therefore consider the A matrix as first rotating by Φ, and then performing different scalings λ1 and λ2 in the NEW x and y directions after rotation, then rotating back by -Φ to our starting orientation, and finally rotating by θ. This process is entirely defined by the four terms Φ, λ1, λ2, and θ.The Φ term can be thought of as defining the axis of scaling. The λ1 and λ2 define the two scaling ratios, and θ defines the rotation angle of this new scaled shape, just as it does in the case of isometries and similarities.

When combined with the final translation terms tx and ty, the complete 6 DOF transformation can be visualized as in the gif shown below:

Re-iterating the process: we first rotate by Φ, then perform different scalings λ1 and λ2 in the new x and y directions, then rotate back by -Φ, then rotate by θ, and finally translate by tx and ty, giving a total 6 DOF transformation.

Projectivities

It seems like we are back where we started! We are back to our full 8 DOF projective transformations in 2D space. However, having gone through some simpler examples of lower degree of freedom transformation groups, we have now acquired some very useful tools for better interpreting what the eight different degrees of freedom of projectivities actually look like.

But first, returning to our matrix representation, we can represent a projectivity as follows:

Here v = 1 or 0, this links back to the argument about only needing eight unique values in the matrix. The general projectivity can also be written in block form as follows:

This usefully highlights that there are 3 separate components to the projectivity. The upper left 4 DOF affine component A, the upper right 2 DOF translation t, and the new lower left 2 DOF “elation” component vT. Don’t worry about what exactly an “elation” is for now, this is a new kind of transformation, and the details will be covered very shortly.

So, can we perform a singular value decomposition (SVD) like we did for affinities, in order to gain a better insight into what is going on? Yes, we can!

The projectivity matrix can be expressed as follows:

Okay, so that was rather long! This is a bit more digestible in block form:

So, we clearly have four matrices in our matrix chain. The question is, what are each of them doing?

Well, the first (left-most) term is just a simple 4 DOF similarity matrix, as we have already covered, great. What about the second term? This is in fact a simple 1 DOF shear transformation (see the wiki page for an explanation). Okay, that’s simple enough, what about the third term? As was the case in the SVD of our affinity matrix, this diagonal matrix essentially scales in our x and y directions by λ and 1/λ respectively. However, in this case, the second scaling term is the reciprocal of the first. The scaling therefore only has 1 DOF, and this constraint essentially preserves the area of planar scaling.

So there we have it, we have come up a nice way of interpreting six of our eight degrees of freedom. But what about the last (right-most) term in our matrix chain? This term is a 2 DOF “elation”, and we will now explore exactly what this means.

Elations
We can deduce what an elation is doing by simply examining the matrix, so looking at our elation matrix as a transformation, we get the following:

Evaluating the matrix multiplication:

We can see that, for arbitrary values of v1 and v2, we get the following two scalar equations for Euclidean 2D points:

The denominator term in both of these equations arises from the homogeneous to Euclidean conversion, whereby the first and second entries of the homogeneous vector are divided by the third.

So what exactly are these equations saying? This essentially means that points are scaled directly towards or away from the origin by a scaling factor, with this scaling factor being a linear function of the starting x and y values of the point. We can deduce that such scalings can only move points along lines directly towards or away from the origin, on account that both the x and y terms are scaled by the same factor, for any given point.

Okay, that seems kind of cool, and definitely a new kind of transformation. So, is there anything else noteworthy about this “elation” operation?
Well, consider the case where vx = 1, vy = 0, and v = 1.

We can then see that the point [∞,∞] gets mapped to the point [1,1]. This is a unique property for elations, which other transformation types have not been able to address. Elations have the power to move 2D Euclidean points from the infinite realm to the finite realm.

Likewise, we can see that the point [-1,-1] gets mapped to the point [-∞,-∞]. This is again a unique property for elations. Elations have the power to move 2D Euclidean points from the finite realm to the infinite realm.

Of course, points don’t have to be projected to and from infinity. For example, the point [1,1] gets mapped to the point [0.5,0.5].

We can consider what this particular elation means for the line y = x, the line gets projected as follows:

Some important points arise from this example. Firstly, although all these points have remained on the same line through the origin (as we already discussed was a necessary condition), the order of these points has changed after the elation. For example, before the elation, the green point existed between the orange and red points. However, after the elation, the green point is now outside of the orange and red points.

This very non-linear behavior is distinctive of elation operations, and is explainable by the fact that the variable scaling parameter is itself a function of our x and y values, and not constant, as was the case in affinities and similarities.

However, things aren’t always so crazy. As an additional example, consider the unit square centered at the origin. The elation described by vx = 0.1, vy = 0.2, and v = 1 gives the following transformation:

Again, you should be able to see that each point on the square has only scaled directly towards or away from the origin.

Projectivity Visualized
So, we now have a relatively clear understanding of what an elation is, we can now visualize what the 8 DOF of our projectivity are actually doing.

This little gif looks nice and all, but this supposedly “intuitive” visualization of the transformation has admittedly become rather long-winded, and to be honest, not all that helpful for facilitating our understanding. What is the use of these 8 DOF projectivities? They very much seem to just kind of randomly stretch, bend, shear, rotate and translate shapes in the 2D plane, in a rather aimless manner.

Well, we will now start to see why these projectivities are actually useful to us. But in order to do so, we need to escape the 2D realm we have been confined to so far, and start thinking about 2D planes located in 3D space.

This is very much like what we did in the last post when visualizing homogeneous to Euclidean conversions, when we talked about the projective plane in homogeneous 3-vector space. Now though, we are not simply using 3D space as a tool to visualize the conversion process, but we instead consider the case where our target application of homogenous co-ordinates actually involves real 2D planes in real 3D space.

Projecting Between Planes

With regards to the main application of projectivities, the clue is really in the name. These 8 DOF transformations are useful for “projecting” between different 2D planes located in 3D space. Projectivities aren’t applicable to any arbitrary kind of transformation though, they are useful for particular situations in which we project through a single common point. These are referred to as central projection.

Consider the example below, where we project from plane π to plane π’. The transformation which maps 2D co-ordinates of plane π to 2D co-ordinates in π’ could be explained by a general 3×3 projectivity matrix.

It is important keep this distinct in your mind from the projective plane we talked about in the last post. In that case, we were dealing with the 3D space of our homogeneous 3-vectors, and using a plane in order to convert these back to 2D Euclidean vectors. In this case, however, we are imagining the actual real-world 3D space in which our real-world problem resides. The following examples should help to solidify this.

Shadows of buildings are a very obvious example of projecting from one 2D plane to another 2D plane, with the projection all occurring through a single point. In this case, our first plane is the side of the building, our second plane is the ground, and our single point of projection is the sun:

Another obvious example is taking a photograph of a planar surface. In this case, the first plane is the surface, the second plane is the camera film, and the point of projection is the light-capturing hole in the front of our camera, which (as we will discuss in later blog posts) can be assumed to be a single point when using a pin-hole camera model:

In fact, our projectivity transformations actually form what is known as a “group” in mathematics. Ignoring the rigorous terminology, this essentially means that the inverse of any projectivity is also a projectivity, and the combination of two projectivities is also a projectivity.

As a good illustrative example, we can imagine projecting to a world plane from one camera, and then projecting back to a second camera from this world plane. Given that we now know the combination of any two projectivities gives us a new projectivity, we can infer that projecting from camera 1 to the world plane, and then the world plane to camera 2, is itself a projectivity, which takes us directly from image 1 to image 2. Pretty cool!

You may notice that this most recent example was rather different from the previous ones. There was an extra intermediate plane for one thing, but was anything else different?

Returning to the original two examples of the cast shadow and the single plane to camera, which were simpler, we can actually intuitively deduce that all the lines of projection only intersected at the central point of projection, and nowhere else. Our only method of projection was to follow infinite rays directly away from the projection point, until the ray intersected both of the planes at some point along its journey, and then these two points were mapped to one-another. Nowhere in that formulation is there any possibility for intersection of the rays as they span outward from the single point of projection. This is illustrated clearly in the 2D diagram below:

However, returning to our third example, with the intermediate projection plane, we can see that the lines joining our two image planes do in fact intersect at multiple points. This isn’t very clear from the 3D diagram above, but the 2D example below should help to illustrate the phenomenon:

A good question you might ask is, do both of these kinds of projectivities actually have full eight degrees of freedom? The new example with intersecting lines seems far less constrained than the original examples. Well, if you were thinking this, you’re absolutely right! Central projections, where all the lines intersect at a SINGLE point, are a more specialized form of projectivity, referred to a perspectivities. Perspectivities are in fact only 6 DOF transformations, rather than 8.

EXPLAIN THIS

However, we have refrained from adding a separate section for these perspectivities on account that they are not groups, which was the case for all other types of transformations outlined in the post so far, such as isometries, similarities, and affinities. But as we have literally just outlined, combining two 6 DOF perspectives does not form a third perspectivity. Rather, we get intersecting lines, and so we know that it’s an 8 DOF projectivity. Perspectives therefore clearly do not form a group!

In order to gain a better intuition of why all of this bending, stretching, and shearing allows us to represent projections in 3D space, consider the simple animation below. This animation corresponds to a perspective camera, faced down, moving and rotating freely above a (slightly dirty) checkerboard floor. The viewing angle of this camera is 135°, which is considerably larger than the usual range of about 25° to 60° for normal cameras. This is to accentuate the warping phenomenon on the image plane.

Looking at the central light grey square, we can see lots of the warping, bending, and shearing occurring as we saw when earlier explaining affinities and projectivities.

Also, with regards to parallel lines, it is clear that the lines on the checkerboard floor never intersect. This is also the case for our first image frame, when the camera is completely faced down. However, when moving and rotating our camera, we see that these lines then do in fact intersect in the image plane, see the gif below.

As we have discussed in previous sections, this ability to take points from the infinite realm to the finite realm during the transformation is due to an elation operation, and projective transformations can be decomposed into a group of transformations, one of which is an elation.

One-Point Projection

Referring back to the Stanley Kubrick image at the top of this post, and having now covered what projective transformations actually are, we can now start to make some sense of it. Firstly, we can see that the image is largely dominated by eight real world planes, forming the inside of the Octagon shaped tunnel. Imagine performing eight separate transformations to each of these eight real-world planar panels, which project from their real world planes to the camera plane. Each transformation maps the real-world parallel lines, which intersect “at infinity”, to non-parallel lines in the image plane, which all then meet at a finite point near the centre of the image. This general technique in photography is referred to as a one-point perspective.

Likewise, we can see the same effect in this famous hallway shot from the movie adaptation of The Shinning, also by Stanley Kubrick.

In this case, the shot is dominated by four planes, rather than eight, as is clearly illustrated below.

There are many more examples of one-point perspective photography, both in Stanley Kubrick’s large filmography, and in more general photography, and the effect is rather aesthetically pleasing in my opinion!

Round Up

So, there we have it, we have finally reached the end of our post on projective transformations in 2D! We should now finally be able to understand the final part of that original Wikipedia definition for homogeneous co-ordinates:

Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix.

Hopefully, you now understand what this short sentence actually means. If not, and you think things were unclear, please drop a comment below, and tell me how I can explain things better!

Otherwise, in the next post, we will be extending everything we have learnt about homogeneous co-ordinates in 2D space to 3D space. The good news is, most of the core ideas are identical, so we’ve done most of the hard work!

Please find other helpful links below, if you fancy hopping around a bit.

Ciao for now.
Dan.

Links

The next post in the series:

Part III: Projective Geometry in 3D

The series to which this post belongs:

Projective Geometry Series

The master series to which this series belongs:

Multiple-View Geometry Series

--

--