Augmented Reality 911 — Transform Matrix 4x4

Published in

Mac O’Clock

7 min readAug 10, 2020

The Eternal Question: “Red or Blue”?… Or “To Know or Not to Know”?

Under no circumstances may inexperienced AR developers believe that matrices is an easy topic. But I’m sure, this topic is easy. And it is also awesome because transform 4x4 matrices is an ingenious and concise way to store information about translation, rotation, scale, shear and projection. In this story I will guide you through all the pitfalls and show you how to use transform matrices for anchors, models and cameras in ARKit, RealityKit, SceneKit and MetalKit. But it’s also an indispensable info for those who work with ARCore, Unity, Vuforia, Maya, Nuke or Unreal.

Story about Color Matrix 4x4 you can read here.

Let’s start it off.

Identity 4x4 Matrix

Ones upon a time there was an Identity 4x4 matrix. In other words, a matrix with a default statement.

The most regular approach for reading 4x4 transform matrix is to read it by columns. There are 4 columns with indices 0, 1, 2 and 3. These columns should be perceived as X, Y, Z and W axis labels. Four matrix rows are also marked as X, Y, Z and W.

        0  1  2  3
     ┌              ┐ 
     |  1  0  0  0  |   X
     |  0  1  0  0  |   Y
     |  0  0  1  0  |   Z
     |  0  0  0  1  |   W
     └              ┘
        X  Y  Z  W

So translate elements live in a column with index 3.

     ┌              ┐ 
     |  1  0  0  Tx |
     |  0  1  0  Ty |
     |  0  0  1  Tz |
     |  0  0  0  1  |
     └              ┘

Let’s see how we could read ARCamera’s translate XYZ values in ARKit framework in Swift programming language. Below we can see that each single ARFrame, out of 60 frames per second, contains info about camera position (column with index 3).

@IBOutlet weak var arView: ARSCNView!arView.session.currentFrame?.camera.transform.columns.3.x
arView.session.currentFrame?.camera.transform.columns.3.y
arView.session.currentFrame?.camera.transform.columns.3.z

Projection XYZ channels, however, live in three different columns — 0, 1 and 2.

     ┌              ┐ 
     |  1  0  0  0  |
     |  0  1  0  0  |
     |  0  0  1  0  |
     |  Px Py Pz 1  |
     └              ┘

Scaling up

Uniform scale is the simplest form of transformation in this type of matrix. Try simultaneously scale 3 diagonal values up and you’ll see that 3 sides of the model became brighter because they got closer to the light sources.

Scaling down

Then attempt to uniformly (a.k.a. proportionally) scale it down. The sides of the model are now farther from the lights, so they are dimmed.

Non-uniform scale

Non-uniform scale is also very simple. Scale an object in one axis only, or in two axis — globally or locally. The following picture represents a cube stretched along global X-axis.

Mirroring along Y-axis

Flipping is another extremely popular operation. This operation can be achieved by inverting any scale value.

In other words a mirroring is a negative scale, or -100%

Translating X, Y and Z

Let’s move our model 0.8 m right, 0.5 m up and 1.1 m away from camera. Take into consideration: Translating -Z is not the same as Scaling XYZ down or dollying a Camera out.

Model is dimmed because it was moved away from lights

Shearing

When you are intending to apply a shear transform you have six variants to choose from:

shear XY
shear XZ
shear YX
shear YZ
shear ZX
shear ZY

Default statement of a **shear** transform (zero values)

Shearing in XY axis

Shear transformation is calculated via sine and cosine trigonometric functions. Do you remember what a hypotenuse and adjacent/opposite sides of a triangle are? So, the value -0.707 corresponds to a rotation angle of -45 degrees in XY axis.

Clock-wise rotation about X-axis

Rotation is a combination of shear and scale transforms. It can be accomplished via calculation of trigonometric functions sin(⍺) and cos(⍺).

Clock-wise rotation. PoV is positive X-axis direction

Clock-wise rotation about Y-axis

In Cartesian Coordinate System a clock-wise rotation is considered as a negative rotation around any axis. In that case it’s a rotation of a cube around Y-axis. Clockwise rotation is performed if we look perpendicular to the positive Y-axis direction.

As in previous example 2 shear and 2 scale values are used

Clock-wise rotation about Z-axis

Values of a clock-wise rotation around Z-axis acquire the negative sign as well as in two previous examples.

When +Z is pointing at us, let’s rotate the cube clock-wise

Also for clock-wise rotation around Z-axis you could apply the following formula with inverted values:

┌                                 ┐ 
| -cos(⍺) -sin(⍺)    0       0    |
|  sin(⍺) -cos(⍺)    0       0    |
|    0       0       1       0    |
|    0       0       0       1    |
└                                 ┘

Counter Clock-Wise rotation about Z-axis

When the camera is perpendicular to the positive direction of Z-axis, let's rotate the model counterclockwise.

Counterclockwise rotation occurs with a “+” sign

Since object’s rotation applied with a help of transform 4x4 matrix isn’t as easy as many developers could expect, 3D frameworks’ architects give us regular tools for rotating — in SceneKit, for example, these are SCNVector3 (a.k.a. Euler’s rotation) and SCNVector4 (a.k.a. Quaternion Rotation).

Now let’s see how it looks like in SceneKit’s project. At first we need to create a node containing box geometry.

let boxNode = SCNNode(geometry: SCNBox(width: 0.25, 
                                      height: 0.25, 
                                      length: 0.25, 
                               chamferRadius: 0.02))

Euler’s rotation is the node’s orientation, presented as pitch, yaw, and roll angles expressed in radians. Let’s rotate it -45 degrees about X-axis (clock-wise).

boxNode.eulerAngles = SCNVector3(x:-Float.pi/4, y: 0, z: 0)

If Gimbal Lock occurs when rotating objects using Euler’s rotation, it’s time to use a Quaternion Rotation that is the node’s orientation, expressed as a four-component quaternion XYZW. In SceneKit SCNQuaternion is a type alias for SCNVector4 class.

boxNode.orientation = SCNQuaternion(x:-1, y: 0, z: 0, w: Float.pi/4)

And if you like matrices use simdTransform instance property with 16 values. The default simdTransform is the Identity Matrix.

import Foundationlet a = cos(Float.pi/4)
let b = sin(Float.pi/4)boxNode.simdTransform = simd_float4x4([1,  0,  0,  0],    // 0           
                                      [0,  a, -b,  0],    // 1
                                      [0,  b,  a,  0],    // 2
                                      [0,  0,  0,  1])    // 3

In this exaple we’ve also rotated our cube 45 degrees about X-axis, clockwise. Pay attention that every column of this simd_float4x4 is written in a line, not vertically.

Multiplication

As a developer, you need some flexibility when working with matrices. For instance, you want to start with an Identity Matrix, assign a new value to translate Z element, and then multiply this element by camera translation factor. Look at a code:

var translation = matrix_identity_float4x4translation.columns.3.z = -1.0let transform = arView.session.currentFrame?.camera.transformlet pose = translation * transform!let _ = SCNVector3(pose.columns.3.x, 
                   pose.columns.3.y,
                   pose.columns.3.z)

About Projection

In a minute we’ll explore how to implement 3D projection using Homogeneous coordinates switch (this 16th matrix element is located at a very bottom row, on right) and the lowest row of elements in 4x4 transform matrix. Homogeneous coordinates, or so called projective coordinates, is a system of coordinates used in projective geometry.

According to Wikipedia’s definition: “Homogeneous coordinates have the advantage that the coordinates of points, including points at infinity, can be represented using finite coordinates. Formulas involving homogeneous coordinates are often simpler and more symmetric than their Cartesian counterparts. Homogeneous coordinates have a range of applications, including computer graphics, where they allow affine transformations and, in general, projective transformations to be easily represented by a matrix”.

Two types of **Camera Frustum** with **near** and **far** clipping planes

Next image illustrates a highly rough approach to creating an orthographic projection matrix.

Ortho projection matrix (very rough approach)

Orthographic Projection

Let’s see how to correctly build an orthographic projection matrix. For this we must create four expressions using width and height values of a view (cuboid “frustum”) as well as far and near values of its clipping planes.

exp01 = -1 / width
exp02 = -1 / height
exp03 = -2 / (far - near)
exp04 = -(far + near) / (far - near)

This is where these expressions must be located now.

┌                         ┐ 
| exp01   0     0     0   |
|   0   exp02   0     0   |
|   0     0   exp03 exp04 |
|   0     0     0     1   |
└                         ┘

Perspective Projection

If you wanna know how to correctly build a perspective projection matrix, follow the same rule but with different values for four matrix elements. At first go expressions:

exp01 = near / width
exp02 = near / height
exp03 = -(far + near) / (far - near)
exp04 = -(2 * far * near) / (far - near)

Then paste these expressions into matrix:

┌                         ┐ 
| exp01   0     0     0   |
|   0   exp02   0     0   |
|   0     0   exp03 exp04 |
|   0     0    -1     0   |
└                         ┘

That’s all for now.

If this post is useful for you, please press the Clap button and hold it. On Medium you can clap up to 50 times per each post.

You can find more info on ARKit, RealityKit and SceneKit in my posts on StackOverflow.

¡Hasta la vista!