3D Graphics: The Perspective Projection

9 min readFeb 28, 2024

Did you know that at one point in time, painters struggled to realistically portray people sitting at a table?

Take a look at the painting of the Last Supper by Duccio di Buoninsegna

Now contrast that with the Last Supper by Leonardo da Vinci.

You can see that one seems better at portraying a realistic scene. And it’s not just that one is painted by the great DaVinci.

The difference lies in the proper use of perspective. Perspective allows artists to create convincing illusions of depth and space, enriching the viewer’s experience and enhancing the impact of the artwork.

In a previous article, I gave a brief overview of the graphics pipeline and the different stages a vertex goes through before ending on the screen. One of those stages is the Vertex Shader. This is where 3D coordinates are transformed into a 2D point. This involves mapping points to a flat surface without loosing the sense of depth and space.

In this article, I want to explore a bit more the transformations that take place during this stage. There will be some math. However, this is optional and only for those who wish to go this route.

Vector Spaces and Coordinate Systems

We saw that vertices live in 3D space and our screen is a 2D space. The objective of this stage in the pipeline is to transform from one space to another. But, what do we mean by space?

When we talk about space, we are referring to the dimensions in which an object exists and can move.

Take a look at the image below. We can see a point, represented by a sphere, in a three dimensional (3D) space. This point can move horizontally from left to right, vertically up and down, and forwards or backwards in depth. These are the three dimensions in which the point lives.

Now, look at the next image. This is a point that lives in a two dimensional space. It can move left to right, as well as, up and down.

Finally, let’s take a look at the last example. In this case we have a point that only moves in one dimension.

In order to represent points in a particular space, we use a Coordinate System. A coordinate system is a mathematical framework used to describe positions, by assigning numerical values to points in space.

Coordinate systems are made up of two key components:

axes
origin

An axis is a numerical line that runs along one of the dimensions in the space. A 3 dimensional space will have 3 axes (x, y, z). A 2-dimensional space is made up of 2 axes (x, y).

The origin is the point where axes intersect. It serves as a reference point from which all other points are measured.

To recap, points live in spaces, and spaces are defined using coordinate systems. A coordinate system is made up of an origin and some axes.

Now, let’s look at how we go from one space to another.

Transformations

In the context of computer graphics, transformations refer to operations or processes that change an object’s space, position, orientation, and size.

In essence, each transformation is composed of a vector-matrix multiplication. Linear algebra is used widely in the field of computer graphics, as it provides a performant mathematical way of dealing with 3D data.

Don’t worry if you don’t know anything about vectors, matrices, or linear algebra. Basically, some data (like vertices) is made up of more than one value. We group these values into a mathematical structure called a Vector.

Sometimes, we need to do more than one operation on each value in a vector. So, we store all of these operations in one big mathematical structure called a Matrix. Matrices organize values in rows and columns.

To perform vector matrix multiplication, we multiply each row of the matrix by the corresponding element in the vector and sum the results.

The result of a vector-matrix multiplication is another vector.

If we wish to transform points from one space to another, we must represent those points as vectors and multiply them by a matrix. The resulting vector will hold the point in the target space.

Perspective Projection

Now we get to the meat and potatoes of the article. Transformations perform operations on vertices. One of these operations is known as perspective projection.

Perspective projection is a mathematical operation that simulates the way a human eye perceives objects in the real world.

In perspective projection, objects that are farther away from us appear smaller than objects that are closer. Parallel lines appear to converge at a vanishing point in a distance.

To mimic real life, a perspective projection matrix will need to scale the vertices according to their distance from our viewer (z-value). But it must take into account the following components:

Aspect Ratio
Field of View
Near Plane
Far Plane

Let’s look into each component.

The Frustum

The way we model perspective, is by first placing our viewer as the origin of the space. In front of us will be the 3D scene we wish to project. Our screen is placed between our view point and the scene. you can think of our screen as a window into our 3D scene.

Everything that is visible through this “window” is inside a frustum.

A frustum is just a fancy word for a pyramid with its tip sliced off.

Let’s take a look at this frustum from the side.

We have placed two boxes with exactly the same dimensions, but one is farther away from the point of origin. If we draw lines from the top of each box to the origin, we see that these lines intersect with the near plane (screen) at different heights. This shows how objects that are closer to our viewer will appear larger than objects that are farther.

Here our screen is called the Near Plane. Any object that is closer than this plane, will not show up in our final image. Likewise, there is a Far Plane. This plane limits how much we can see into the distance. Objects that are behind this plane will not show up in the final image.

Field of View

An important component of perspective projection is the Field of View. The field of view (FOV) is how much of our scene can we see through the screen or viewport.

Previously, we saw that when viewed from the side, our frustum has two imaginary lines extend from the origin to the near and far planes. The FOV is defined by the angle between these two imaginary lines.

A wider field of view captures more of the scene and makes everything smaller in order to fit the scene. This is similar to zooming out.

A narrow field of view shows less of the scene and makes everything appear bigger. This has the effect of zooming in.

This means that when transforming points from 3D to 2D, we need to scale them by some value that takes the FOV into account. What should this value be?

Let’s look at our frustum from the side one more time.

Let´s place a point in 3D space, we´ll call this point p. If we draw a line from this point to the origin, we can see that the line intersects with our near plane. Let´s call this intersection point p_i, this is our projected point.

Consider the right triangle that is formed by the viewer (origin), the near plane, and the point of interest in the scene (p). The field of view angle, denoted by the Greek letter alpha (α), is the angle between the lines from the origin to the top and bottom of the near plane.

Using trigonometry, we can see that tan(α/2) is the ratio of half the height of the near plane to the distance from the origin. However, we want to ensure that points farther from the origin are scaled down more than points closer. So we take the inverse instead.

This means that we should multiply our point´s x and y coordinates by the inverse tangent of alpha divided by 2.

The result is a 2D point that has baked-in depth information to preserve perspective.

Aspect Ratio

In perspective projection, taking into account the screen´s aspect ratio is crucial in order to maintain the correct proportions of our scene.

Aspect ratio is the proportional relationship between a screen´s width and height.

Taking aspect ratio into account is important in order to prevent any distortion and maintain consistency across devices. Most display devices are rectangular, not square.

to calculate the aspect ratio, we divide the screen width by the screen height.

Depth Normalization

We have already taken into account the field of view, the aspect ratio, and the distance from the origin. Technically, we can take our vertices and project them into screen space. But typically, perspective projection matrices perform one additional operation known as depth normalization.

Depth normalization is not related to how the vertex will be projected on the screen, but it will help use determine the visibility of objects on the screen.

In later stages of the 3D graphics pipeline, pixel fragments with depth values closer to the viewer will replace fragments with greater depth values.

To perform depth normalization, we must multiply the z component of our vector by the following term:

Where Z_far is the distance from the origin to the far plane, and Z_near is the distance from the origin to the near plane.

I´m not going to get into where this term comes from, that is beyond the scope of this lesson. You really don´t need to know this in order to use a perspective projection matrix.

This maps the area inside the frustum to a range of -1 to 1. But we must remember that our near plane is offset from the origin. To fix this, after multiplying z by the previous term, we add the following term:

Now the components of the perspective matrix are complete. Let´s see how we combine them into one big structure.

The Projection Matrix

Here it is in all it´s glory! Our complete perspective projection matrix.

Notice that this is a 4x4 matrix. If you know anything about vector multiplication, then you know that we canot multiply a 3D vector by a 4x4 matrix. So, in order to use this matrix we need to convert our 3D vertices into 4D vectors.

Don´t be spooked by what you just read. All we need to do is add a 1 at the end of our vector. This 1 is known as w.

Due to how vector-matrix multiplication works, after transforming our vertices by this projection matrix, we still need to divide our x and y coordinates by z. But z will be normalized after the multiplication.

By adding a 1 as a w component to our vector, we can store our original z value, which we can later use to perform a perspective divide.

Final Words

I am aware that this lesson might be a lot to take in. It´s okay if you don´t have everything clear yet.

The theory and math of perspective projection is not a requirement for you to create 3D graphics. But, having an insight into how things work under the hood might help you down the line when you are debugging graphics calculations.

In the next lesson in this series, we will put all of this theoretical math jargon into practice.