Game developer’s guide to graphical projections (with video game examples), Part 2: Multiview
In this series we’re learning how to draw 3D objects onto 2D surfaces, a process known as graphical projection. If you have no idea what I’m talking about, you should read Part 1: Introduction.
Here’s the big picture:
Today we’ll look at the bottom-left corner, top and side view, which form multiview orthographic projections.
The name multiview comes from technical drawing, where we’re describing an object from multiple viewpoints, usually for design or construction/manufacturing purposes.
It’s used in architecture …
… vehicle design…
… manufactoring …
… and even game development.
A multiview drawing includes up to 6 primary views: 2 so called plans (top and bottom view), and 4 elevations (left, right, front, back).
Plan and elevation are two very common views in video games as well, usually called top-down and side view. They are easy to use because we don’t have to care about the third dimension at all. Everything in the game can be described with 2 coordinates (x and y), simplifying storage and calculations (drawing, physics, artificial intelligence …). For example, collisions in 2D are computationally much easier to deal with than in 3D.
At first, games looked like the developers literally gave up the third dimension as we stared into the black void in the background.
Eventually we got to see interior details, indicating we’re still part of the 3D world, we’re just looking at it from the side. Doors could even let us travel through the third dimension.
We soon saw outside scenes as well, with lush backgrounds faking perspective with more or less clever tricks.
Side-view games started looking less flat, except for their gameplay and physics that happened almost exclusively in 2 dimensions.
As one of the parallel projections, objects drawn with multiview projections retain size no matter where in space they are positioned.
This means that when a character moves around the scene, we draw the same image to a new location without any need to scale the sprite.
The same applies when creating artworks. Once we draw an object, we can freely move it around the canvas to fit our composition.
Sticking to multiview allows us to measure things in real life and transfer their dimensions directly to the image using our scale.
Let’s say we choose 1 meter = 25 pixels, a 5×4 m room then translates into a rectangle of 125×100 pixels.
For most technical things you can find blueprints on the Internet. Drawing cars, for example, is a piece of cake because you can easily find a reference, scale it to the desired dimensions, and draw directly over it.
You can also shoot your own references if you can get perpendicular to the main plane of your object. Everything that is not in that plane will get distorted, so you’ll have to account for that. The more narrow-angle lens you use, the better (also applies to using optical zoom, which changes the angle as well).
Just like sizes and distances, all the angles you measure in real life directly transfer to the image (again assuming you’re measuring them in the main plane).
All surfaces that go directly towards/away from the image degenerate into a line. For example, floors and side walls in front view are just lines.
3D volumes also simplify. Boxes become rectangles, spheres become circles, cylinders become rectangles or circles, depending on which side you see.
Mario pipes are just rectangles with shading that indicates they’re circular.
Same with barrels and pipes in Batman.
Cones become triangles, which is exactly how most of us draw pine trees.
Don’t forget that you can also have portions of volumes. Half a sphere becomes half a circle, a shaved off cylinder becomes a shaved off triangle (trapezoid).
Illusion of depth
It’s important to realize that even pure 2D games try to represent real, 3D world (unless we’re talking some abstract/board game). Even though we discard the third dimension, we still communicate depth in multiple ways.
One simple cue that things have a different Z coordinate is overlap.
We’re used to seeing closer things up front, covering things further away in the back.
There’s no doubt that Link is in front of these steps. For things that aren’t overlapping, it could go either way. Are the torches closer than sleeping Zelda? Are the columns aligned in a line? We will never know.
One thing to be cautious about is making two objects touch each other. This makes the scene look particularly flat because we assume they are touching in 3D as well (that’s the most common reason why they’d be aligned in our view).
A general rule of thumb is to avoid such ambiguous compositions and fix it by pushing the two objects closer together to enhance depth — unless they are actually supposed to be touching.
Don’t forget you can put things in front of the main character as well. The game Flashback, although not completely in side-view, uses overlap to great advantage by placing some of the scenery in front and some behind the character movement plane.
One more thing you can notice in the screenshot above is how things futher away lose saturation and fade towards blue the more away they are. In art we call this atmospheric or aerial perspective (although it technically has nothing to do with perspective projection—the perspective part merely describes that it is used to communicate depth, distinguishing the background from the foreground).
Atmospheric perspective occurs because air and other particles scatter (diffuse) light as light passes through them. The more air between the viewer and the object, the more scattering and loss of contrast.
The form of scattering we observe on a daily basis is Rayleigh scattering. It disperses shorter wavelengths stronger (violet and blue light) than longer ones (yellow and red). This makes the sky blue, as we get bombarded by blue part of sunlight from all angles. You might not think of it this way, but when you’re looking at the sky, you’re actually looking at the sun—the blue parts of sunlight that took a curveball to reach your eyes.
The further things are, the more air (atmosphere) separates you and the object, allowing more of the sky color to scatter in from surrounding angles.
Fog works on a similar principle, except it’s caused by water particles instead of oxygen and nitrogen. Droplets in fog and clouds are much bigger than air molecules (around 10–15 µm compared to 0.4 nm) so instead of Rayleigh scattering we get non-selective scattering that affects all colors equally and fades things in general. Humid jungles are a good candidate for this.
As artists we can easily go beyond realism and pronounce this effect to make our scenes more readable. See how the background railing in the bottom-right of the screenshot uses discoloration and lack of details to achieve this? The two railings are so close together that you’d need some crazy dense fog to have such a difference in real life. The artist decided it’s more important to make it clear what is in the back than to blindly follow realism.
Mind that what we’re learning in this section applies to other projections as well. By placing overlapping objects in our composition we explicitly communicate depth of the scene in any projection. Same with atmospheric perspective.
It might not be the most common thing in old racing games, but fading the racetrack in the distance comes from the same reasoning .
Atmospheric perspective is not the only thing that changes power with distance. All light sources fade through their reach and they do so inversely with the square of distance.
In the absence of other light sources, an object twice as far will be a quarter as light.
While we see this mostly in dungeon crawlers in 1-point perspective, the same principle can be applied to side view. If the scene is lit by torches, objects far away—including in the depth dimension—should appear darker.
Our lighting setup and composition can be much more complex, with varying values at different distances.
See how the grass at the very bottom of the screenshot (complete foreground) is black, the middle plane with the characters is light, the grass behind them is again dark as it fades to self-shadow, and the hills in complete background are the brightest. It makes for a very pleasant composition with characters that clearly stand out.
Two obvious variants for good contrast are to have dark backgrounds with lighter foreground …
… or the other way around, with dark items close to camera and light backgrounds shining through. The latter has become a popular aesthetic in modern games.
It’s also popular to black out inner parts of architecture, since they wouldn’t see any light at all—a technique which, despite being very logical, was barely seen in old games.
A compromise between old and new is to fade out the lighting towards the bottom of the platform.
If we return to the previous century, Sonic gives us another example of how lighting affects shade.
Notice how the bottom part of the loop that goes behind the front is darker. This works well to communicate depth, but also has an underlying physical reasoning.
For the exit of the loop to end up in the background, the road needs to curve slightly. This means that the bottom-left part of the loop isn’t parallel to the screen, but angled slightly. In particular, it’s turned a bit away from the light source thus appearing darker. By the way, unlike the inverse-square law for light falloff, the math behind shading of surfaces at different angles is a sine function of the angle at which the light hits the surface (usually expressed as a cosine of the angle between the light ray and the surface normal, a.k.a. Lambert’s cosine law).
Connected to lighting is the use of shadows to imply depth. If we have a light source with rays coming from front to back, the objects in the foreground cast shadows onto the back walls.
While not super contrasty, you can see the shadow of the ladder two tile blocks to the right. Grey bricks and pipes are also darker under platforms. Notice how there is an offset where the shadow passes the pipes. The pipes are closer to the viewer than the back wall and thus less in shadow. Their shadows also curve down as the pipe goes towards the wall due to their circular shape.
Lighting in Terminator works on the same principle except the rays are angled 2 tiles down–1 across rather than at 45 degrees.
See how in Gods the walls throw a 16 px-wide shadow while the ladder only has 2 darker pixels on the bottom and right side. This implies that the ladder is flush with the wall, unlike in the Batman example where the ladder went in front of the platform. The distance from the object to its shadow implies how far from the back surface it is.
An interesting feature of multiview projections is that for directional light sources, the shadow is the same shape as the object casting it—if we neglect the object’s own depth (volume). For pretty thin objects this is a good approximation, for others it is deceiving because the shadow implies they are flat.
Quite some examples we’ve seen so far use another trick to imply depth: the backgrounds are drawn in smaller scale. This is technically a cheat if we’re talking proper multiview projections, because the scale should be uniform no matter the depth. Cheating makes them fall under my lovely-named Frankenstein projections.
It’s an effective trick though. When we view the world with our eyes, objects further away become smaller due to perspective. Most side-view games use this principle to imply depth.
The buildings in the background are much smaller than the main tower.
It’s the same with clouds in Mario.
If this was proper side view, those clouds would be just a few meters long. Our minds rather interpret they are small because they are far away.
Unlike with Super Bros where clouds move together with the rest of the scene, the ultimate illusion of depth is to move the background layers slower than the foreground.
This technique, known as parallax, has been an important leap forward towards real perspective. While each of the layers are still 2D planes, correct factors of slowdown can produce geometrically correct results (unlike the example above where they couldn’t afford more than two layers and so the far city skyline moves together with the closer buildings).
Still, no matter how many layers we can have with modern technology, it’s hard to lose the feeling of being in cardboard theater.
Some get away with it more convincingly than others.
Let’s close with process examples for both drawing and coding.
In 2016 I designed a mini course for people that want to get started with sketching in nature. It used drawing in side view so that people didn’t have to worry about perspective.
You can start with my medium article about it if you want to give it a try. The whole process looks like this:
Side view is straightforward because things that are vertical and horizontal in real life simply turn into vertical and horizontal lines on paper. The rest of the volumes simplify like explained in the Properties section. It’s relatively easy to go from real life …
… to a sketch.
And there’s no reason why you shouldn’t do the same with pixel art.
It really isn’t that hard. The dome is a portion of a circle, columns are rectangles, pine trees start as triangles …
Projections that we’ll talk about next time will only be more complex, so side view is a great place to start.
Most game engines will have a way to draw things directly in 2D, so you don’t have to worry about projections at all. But if you ever find yourself trying to render a 3D scene in top-down or side view, you should look for an orthographic camera.
Here’s an example for Unity, but you should be able to find similar options in your 3D engine of choice:
In code this is done by setting an orthographic projection matrix. In Unity we have the Matrix4x4.Ortho method for this purpose:
Or if you want to build the matrix by hand:
Let’s try it out on a more complex scene.
The image above is rotating with a general orthographic projection. To get one of the multiview projections you have to rotate the camera exactly by multiples of 90 degrees (zero is fine too). Here are examples for top and side view:
Note that we have to position the camera away from the center of the scene. With the default clipping planes at 1 and 1000, 500 units offset places the scene right in the middle of our view frustum.
I’m not using proper pixel art textures, but if I did I could make one texture pixel coincide with one render pixel. This would allow me to use Unity’s 3D engine with all its fancy lighting and physics to render a seemingly 2D pixel art scene. For more information on such approaches read my article Pixels and voxels, the long answer. Pay special attention to the game Pathway; it uses 3/4 top-down axonometric projection, but the same can be applied to side or top view.
Understanding the rules of multiview projections on its own won’t make you the best artist in the world. It’s a great place to start and by conquering this knowledge you’ll be able to create very accurate lineart for the 6 primary views (you’ll essentially learn how to draw blueprints).
It doesn’t say much about light and color though. Going from outlines to accurate shading will take understanding of how light works in 3D. Before we can study this though, it’ll be helpful to learn how to draw with all three dimensions in the first place.
The road ahead is long, but worthy.
I honestly thought I’d write a 5–10 min read, but alas I can’t seem to do short things anymore. I hope it was worth the effort and you enjoyed the extra content I put in. Also, another shoutout to Super Adventures in Gaming for providing almost all screenshots of older games.
Next time we’ll go to my favorite topic, axonometric projections, including the fan-favorite isometric view.
Till then, happy pixels!
This article was brought to you by patrons including Reuben Thiessen, Qinapses, Magnus Adamsson, Jeff Chang, … (dot dot dot), CarbonBond, and Robert ‘Pande’ Kapfenberger.