Part I: Projective Geometry in 2D
A short blog post introducing what is meant by projective geometry, and the application of homogeneous co-ordinates. This post will be limited to the case of 2D points and lines, with later posts generalizing to 3D
Before diving in, it is worth emphasizing that everything covered in this post is derived from chapter 2.2 of the well known Hartley Zisserman book, in particular pages 26–29. If you would like a more formal description on any of the topics subsequently covered, this online book could therefore serve as a helpful complimentary resource. For a softer entry into the topic, though, I would strongly recommend starting with this blog post!
Right… Still here? So let’s get to it.
One of the most important ideas to grasp when dealing with multiple view geometry in computer vision is the concept of projective geometry, and the associated homogeneous co-ordinates. So, you may ask, what exactly are homogeneous co-ordinates?
Let’s see what our trusted Wikipedia has to say:
In mathematics, homogeneous coordinates or projective coordinates are a system of coordinates used in projective geometry. They have the advantage that the coordinates of points, including points at infinity, can be represented using finite coordinates. Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts. Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix.
Okay… So that sounds interesting (especially the bit about projective transformations being easily represented by a matrix!) but that was all quite confusing, and we still don’t actually know what they are.
Well, before diving into derivations and mathematical complexity, let’s just start by looking at some very simple examples of homogeneous co-ordinates, and then we can maybe start to work out why this representation is actually beneficial afterwards.
Right, so without further adieu, let’s get stuck in!
Euclidean Vector Representation
In standard Euclidean space, using column vector notation, we would normally represent a 2D point x as follows:
However, for homogeneous co-ordinates, we simply add an extra dimension to the bottom of the vector, with an entry of 1. We then multiply the entire new vector by an arbitrary scaling factor kp, like so:
In order to transform back to our original x, y representation, we simply divide the first two vector entries by the third entry (which is always equal to the scaling factor kp), and we arrive back at our original x and y terms.
This may seem like a very strange, and seemingly arbitrary conversion to make, but the benefits of this representation will become clearer soon. Ignoring the strangeness, the conversion seemed simple enough!
So, that’s all well and good, but what about for lines?
In standard Euclidean space, we actually have a number of different options for representing a line.
Using vector notation, given 2 different points on the line represented as vectors a and b, we can parameterize any other point x on the line as follows:
In this case, the parameter λ is a scalar, which when varied, essentially moves us along the length of the line, and returns us the vector representation of the point on the line x. As with any vector equation, the equation above essentially encapsulates a number of separate scalar equations, in this case two, one for the x co-ordinate and one for the y co-ordinate:
It is also possible to represent a 2D line with only a single scalar equation. Defining the gradient of the line as:
We can combine the two scalar equations for x and y defined above, and arrive at the following:
By setting x = 0, we can see that the y intercept g is given by:
Re-writing, we arrive at what is arguably the simplest parameterization for a line in 2D space:
This representation usefully highlights that only two numbers are required to fully define a line in 2D space. In other words, a line in 2D space has 2 degrees of freedom.
However, if we wanted, we could also multiply both sides of the equation by an arbitrary scaling factor kl, and the equation would remain valid:
Re-arranging, we can write the equation as follows:
This takes the same general form as one very often taught in schools:
So, it’s all well and good that we have a number of different ways of representing a line in 2D space, but how about with homogeneous co-ordinates?
Well, this actually connects directly with the final scalar representation we just discussed. In homogeneous co-ordinates, a line in 2D space is simply represented as follows:
The representations in this final equation are all equivalent, and we will interchange their usage later on in the post, based on which is most convenient at the time. We will also refer back to this equation later on during some derivations, in order to exchange between a, b, c representations and m, g, kl representations.
Finally, in a similar manner to our homogeneous point representation, in order to transform back to our original m and g representation, we simply divide the first and final vector entries by the negative of the second entry (which in this case is always equal to the scaling factor kl), and we arrive back at our original m and g terms which define our 2D line.
Properties of Homogeneous Points and Lines
So, now we know how to represent points and lines using homogeneous co-ordinates, great!
But you may be wondering, like I did the first time I learnt about homogeneous co-ordinates, why would we ever want to represent things like this?
We know for a fact that both points and lines can be fully defined using only two parameters each. So why would we want to use three for both, by adding an extra dimension and a random scaling factor in both cases?
Well, the benefits start to become clearer when we look at some of the properties and simple equations that arise when using homogeneous representations. We now explore some of these useful properties!
1. Checking if a point lies on a line
This means, the point x lies on the line l if and only if the dot product of x with l is equal to zero. If we expand this equation further, we can quickly see why this is the case:
performing the dot product:
Dividing both sides by kp × kl, and re-arranging:
This is the scalar equation that we started with, which outlines the required condition in order for a point to be on the line. Any values of x and y which satisfy this equation are, by definition, on the line l.
2. Intersection of lines
This means, the lines l and l’ intersect each-other at the point x, given by the cross product between the two lines l and l’. In order to verify that this is the case, let’s first consider how we would normally determine the intersection point of two lines. We will start by using parameters a, b, and c here instead of m, g, and kl, in order to reduce the number of terms in our equations.
So, we essentially have two linear equations (for the two lines), and two unknowns, x and y, for the point:
We can therefore simply construct the simultaneous equations as a matrix multiplication, and solve the problem via matrix inversion:
Performing the inversion:
Matrix multiplying the right hand side:
As scalar equations:
Converting from a, b, c to m, g and kl representation, and dividing both the numerator and denominator by kl, we arrive at the following:
So, now we know what we’re looking for! The question is, do we get the same result for our x and y co-ordinates when simply performing the cross product between our homogeneous line representations l and l’ as promised?
Expanding the cross product:
Writing in homogeneous column vector form, and equating to the homogeneous point representation for x:
Converting back to Euclidean space x, y representation:
Converting from a, b, c to m, g and kl representation, and dividing both the numerator and denominator by -kl, we arrive at the following:
These are identical to the equations derived from the matrix inversion method. Therefore, the theorem has been proved as expected!
3. Line joining two points
This means, the line l joining the point x and x’ is given by the cross product of the two homogeneous point representation x and x’. In order to verify that this is the case, let’s again consider how we would normally determine the line joining the points. This is actually very simple, we have already previously shown how to get the gradient m and y intercept g from two points a and b. So, referring to x and point 1, and x’ as point 2, we have the following:
We can also expand the m term in the bottom equation, so that g is expressed only as a function of point co-ordinates:
And that’s it, these two parameters m and g fully define our line. So let’s see if we arrive at the same gradient m and y intercept g by using the homogeneous representation and performing the cross product!
Expanding the cross product:
Writing in homogeneous column vector form, and equating to the homogeneous line representation for l:
Converting back to gradient m and intercept g representation (negative signs kept in numerator and denominator in order to arrive at the exact same form for m and g as determined above):
So there we have it, we have arrived at the same equations for m and g as determined above, by simply performing a single cross product operation between x and x’. That’s another theorem proved!
So, the use of homogeneous co-ordinates is starting to look a bit less random, and some of these properties are really very simple and useful. Are there any other useful properties of this representation?
In standard Euclidean geometry, if you were tasked to find the point of intersection of the following two lines, how would you do it?
Plotting the graph immediately reveals the difficulty of this task
You would probably tell me one of two things, either
(a) Impossible, the lines do not intersect
(b) The lines intersect at “infinity”
Neither of these responses are particularly satisfying. Do we just have to accept to ourselves that in Euclidean space, there are “special” scenarios, in which solutions for intersecting lines are impossible, and we have no easy way of representing such infinite points of intersection?
Well, what if we were to convert to homogeneous co-ordinates, and apply Theorem 2 from above? Let’s see what happens.
The lines in homogeneous form are given by:
The point of intersection is given by:
Okay, so we have actually got ourselves a finite set of homogeneous co-ordinates for this infinite point of intersection. The nature of this is made clear when we try to convert back to Euclidean representation, by diving the first and second terms by the third term:
This is to be expected, as we have already established that there is no such intersection in Euclidean space. Yet despite this, we helpfully now have a finite co-ordinate representation in homogeneous co-ordinate space, which encapsulates this non-defined infinite behavior in Euclidean space.
In fact, there is a special name for the set of points in homogeneous space which do not have valid associated points in Euclidean space. These are referred to as Ideal Points. More specifically, ideal points are the set of points for which the third homogeneous term is equal to zero, with t1 and t2 representing the general first and second terms of the vector:
The Line at Infinity
So, can we make any deductions about this collection of ideal points? We know that they all lack any associated Euclidean point representations. We can also actually show that all of these points lie on a single line, which is referred to as the line at infinity. This line is given by:
We can show this using Theorem 1:
Likewise, when trying to convert this line to Euclidean space, we get the following:
This re-iterates the point that this line is not representable in Euclidean space.
This example with parallel lines, and the notion of ideal points and the line at infinity, have all served to highlight an important point regarding our choice vector space. We will investigate this point soon, but first, we need to formally introduce the notion of vector spaces.
When we have been dealing with simple 2D vector representations, we have been operating in what is called Euclidean vector space. In our 2D case, this is represented by two-dimensional real co-ordinate space, denoted by:
In this vector space, a point is represented as a 2-vector, where the two entries are independently permitted to take any real value:
Likewise, when we have been dealing with homogeneous co-ordinates, we have been operating in what is called Projective vector space. In our 2D case, this is denoted by:
In this vector space, however, a point is represented as a 3-vector:
You might ask, why are our homogeneous co-ordinates not simply denoted by R3 space?
Well, if our three homogeneous co-ordinates were independent, we would indeed be operating in R3 space.
However, as proven earlier, our three entries are not independent at all. We are able to transform them to two unique values in Euclidean space, and so we still only have two degrees of freedom in our representation.
This is a very subtle point to grasp, and can be the cause of much confusion. The superscript 2 after the P essentially denotes that the entity that we are using the vector to represent has two degrees of freedom (the entity being either a 2D point or a 2D line).
Having said that, each of the three entries x1, x2, x3, of our homogeneous vector are entirely free to take on any real value. In other words, we can choose x1, x2, and x3 to be any real values we want at all, and there will exist an associated point or line. However, this point or line still only has two degrees of freedom, thus the superscript 2.
Now, the interesting point about these two vector spaces is that the Projective space P2 contains points which cannot be represented by R2. Namely, these are the ideal points (bear in mind that infinity is not a real number, and thus is not included in vector space R2).
Therefore, (straying away from rigorous mathematical terminology for the moment), the vector space P2 encapsulates the vector space R2. In other words, regarding representational capacity, we arrive at the projective vector space P2 by augmenting the vector space R2 with the ideal points. This relationship can be shown diagrammatically as follows:
So, what is the point in considering “ideal points” if they have no meaning in Euclidean space? The motivation behind introducing homogeneous co-ordinates in the first place was to make operations simpler for our 2D points and 2D lines in Euclidean space, which is what we actually care about. These strange ideal points and lines at infinity seem to bear no concrete connection to real points and lines at all.
I feel your frustration, but I simply have to ask you to be patient for the time being, and as we progress through the series, the relevance of ideal points and the line at infinity will present themselves. For now, however, it suffices to simply understand what they are.
So far, everything we have been dealing with has revolved around equations and proofs. We haven’t really had the chance to gain a real intuitive understanding for what is happening when we transform between 2D Euclidean representations and their associated homogeneous representations. Likewise, we kind of have to just accept that our three proofs are true, because the algebra says so. But, again, can we do better than this? And better facilitate our understanding?
The projective plane provides a much more visual and intuitive way of thinking about the conversions and the proofs discussed above. Before explaining what the projective plane is, to start things off, let’s think about how we might choose to visualize the 3D homogeneous co-ordinates representing 2D points.
In order to represent homogeneous co-ordinates for 2D points, we could just plot it as a standard 3D vector like so, with x1, x2, and x3 representing the general dimensions of our homogeneous 3-vector space, and t1, t2, and t3 representing the homogeneous co-ordinate terms of our particular 3-vector:
The co-ordinate of the underlying 2D point we are representing is, by definition, independent of the scalar value kp. As such, the length of this 3-vector is irrelevant to our representation of the 2D point. The only meaningful aspect of this 3-vector is it’s direction. We can therefore actually consider our homogeneous co-ordinate as an infinite ray along the direction of the 3-vector:
Well, what about converting back to our underlying 2D point? We know that in order to do this, we need to divide all three terms by the third entry, making the third entry equal to one. In order to determine the underlying 2D point, we can therefore just find the intersection of this ray with the plane located at x3 = 1. We refer to this plane at x3 = 1 as the projective plane!
So, this provides us with a more intuitive way of thinking about conversions between homogeneous co-ordinates (infinite rays) and their associated 2D Euclidean points (intersections of these rays with the projective plane)
We were able to imagine our homogeneous co-ordinates of 2D points as infinite rays, but is there an equivalent helpful way of thinking about homogeneous co-ordinates representing 2D lines? Thankfully, there is! Let’s first explain what that representation is, and then afterwards, we will show why this works.
So, homogeneous co-ordinates of lines can in fact be thought of as infinite planes coincident with the origin, where the homogeneous 3-vector represents the vector normal to the plane, like so:
We can then think of the associated 2D line by intersecting this plane with the projective plane, and then plotting the resultant line of intersection, like so:
So, why does this work? This is actually relatively simple to prove. Firstly, any plane in our x1, x2, x3 space can be described by the following equation:
Our normal vector to the plane n is given by:
It can be shown that, given this vector n normal to the plane, the equation of the plane is (refer to this Wolfram MathWorld article for an explanation):
We can see that, by setting x3 = 1 (which represents the intersection of this homogeneous plane with the projective plane), replacing our t1, t2, t3 terms with the associated kl.m, -kl, kl.g terms, dividing both sides by kl,re-arranging, and reverting back to x y notation instead of x1 x2 notation, we arrive back at our original 2D line equation:
Given this proof, we can therefore usefully think of homogeneous representations of 2D lines as infinite planes in our homogeneous 3-vector space, with normal vectors to the planes given by the homogeneous co-ordinates. We can then imagine converting to 2D lines as the process of finding the intersection between this plane and the projective plane.
1. Checking if a point lies on a line
So, given our new tools for thinking about 2D points and lines in homogeneous vector space, and using the figure given below, let’s refer to our homogeneous point as an infinite ray through the origin (shown in red), and our homogeneous line as the normal vector (shown in orange) to an infinite plane through the origin (shown in blue). Also note that the dot product of two vectors is only equal to zero if the vectors are perpendicular (90 degrees apart). This result is common knowledge. It is therefore visually very clear to see that if the dot product between this infinite ray (red) and the plane normal (orange) is zero, then they must be perpendicular, and the ray must lie on the plane (blue). Consequently, the ray (which represents our 2D point) must intersect the projective plane (green) somewhere along the same line that our plane (which represents our 2D line) does. The 2D point therefore lies on the 2D line if and only if the dot product between their homogeneous vector representations is zero.
2. Intersection of lines
Referring to the figure shown below, in this case, we have two 2D lines, represented by plane 1 (red) and plane 2 (blue). It is common knowledge that the cross product between two vectors gives a third vector, which is perpendicular to both of these vectors. Therefore, given the two plane normal vectors (also red and blue, but represented as arrows), the cross product of these two vectors gives a third vector which is perpendicular to both (orange arrow). This new vector represents the infinite ray along which these two planes intersect. This infinite ray of planar intersection is clearly still the point of (red and blue) planar intersection at the projective plane (green). Therefore, when this ray (orange) is projected through the projective plane, it corresponds to the intersection point of the two associated 2D lines. If this is difficult to grasp or visualize, please leave a comment at the bottom, as I realize this can be a bit confusing!
3. Line joining two points
Referring to the figure shown below, in this case, we have two 2D points, represented by ray 1 (red) and ray 2 (blue). As previously stated, it is common knowledge that the cross product between two vectors gives a third vector, which is perpendicular to both of these vectors. Therefore, given the two ray vectors, the cross product of these gives a third vector which is perpendicular to both (orange arrow). Given that this new vector is perpendicular to both, by definition, it must represent the vector normal to the plane formed by both of these rays. We therefore have a 2D line representation from performing this cross product, and the associated plane for the line (shown in orange). Given that we know the two infinite rays (red and blue) must lie on this plane, we can deduce that all three of these must intersect the projective plane across a common line. Therefore, performing the cross product has indeed provided us with the homogeneous representation of the line joining the two 2D points!
With these few examples involving the projective plane, we have seen how it can provide us with a very powerful visualization tool, which can greatly aid our understanding and intuition for what is really going on when we deal with homogeneous co-ordinates, and why all of these very helpful and simple equations actually work!
So, there we have it, we have finally reached the end of our post on homogeneous co-ordinates in 2D! Let’s see if we can make any more sense of that Wikipedia definition that we presented right at the beginning.
In mathematics, homogeneous coordinates or projective coordinates are a system of coordinates used in projective geometry.
They have the advantage that the coordinates of points, including points at infinity, can be represented using finite coordinates.
Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts.
Homogeneous coordinates have a range of applications, including computer graphics and 3D computer vision, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix.
As I am sure you noticed, the first three sentences from this definition have been separated out from the quote and emboldened. Hopefully, you now have a better understanding of what these sentences actually mean.
If not, and you think things were unclear, please drop a comment below, and tell me how I can explain things better!
Otherwise, in the next post, we will be addressing the final sentence from that Wikipedia definition, and exploring how affine and general projective transformation can be more easily represented by a matrix. Fun fun fun!
Please find other helpful links below, if you fancy hopping around a bit.
Ciao for now.
The next post in the series:
Part II: Projective Transformations in 2D
The series to which this post belongs:
The master series to which this series belongs: