Apple ARCamera. Camera parameters explanation for 3D reconstruction pipeline

4 min readJan 27, 2023

If you start working with ARKit for the first time you might notice that Apple documentation doesn’t have very detailed descriptions and explanations for some properties. As for me, I spent some time for experiments with ARCamera to reach full level of it’s matrices understanding while building 3D reconstruction pipeline.

You can consider this post as an extension of Apple ARCamera documentation with detailed explanation of everything related to camera position, extrinsic and intrinsics parameters.

1. Intro

Below is some basic info regarding matrices (Apple uses column first appraoch) and coordinates systems (world and camera). Feel free to skip it if you are familiar with those topics.

1.1 Matrices

The important thing to start with is that in classical linear algebra the matrix m x n contains m rows of n values (n columns). The elements can be addressed by using i, j indexes that refere to i-th row and j-th column.

ARCamera object’s properties have simd (Single-Precision Floating-Point Matrices) type, which is column first. Therefore simd_floar2x3 is a matrix with 2 columns and 3 rows and accessing its elements follows the same rule — column index goes first.

let value = matrix[1][2] // get element from 1st column and 2nd row

More infromation regarding simd types that Apple uses you can find in official documentation.

1.2 Coordinate systems

1.2.1 Right hand coordinate system
Apple is using right-hand coordinate system, which defines orientation of X,Y,Z axis in the space. Let’s assume that you have horizontal X axis that points to right and Y axis that points up. In right-hand system Z axis will point toward, as if you rotate the bottle cap counterclockwise and it moves to you. For left-hand system Z will points in oposite direction.

1.2.2 World coordinate systems
In a nutshell world coordinate system is a zero point (origin) and orientation of all axises. It is defined during ARSession initialization and can be changed later. Origin point always corresponds to device location during the session init, while orientation is defined by worldAlignment property of ARWorldTrackingConfiguration instance.

Possible values of worldAlignment:
.gravity — default values. Y is parallel to gravity and point up. Z toward camera position
.gravityAndHeading — Y is parallel to gravity and point up. Z points to the South
.camera — at each moment of time world coordinate system correspond to camera coordinate system

World coordinate system can be adjusted any time by using method setWorldOrigin method of ARSession instance:

arView.session.setWorldOrigin(relativeTransform: transform)

where transform is a 4x4 matrix that is applied to current word origin.

1.2.3 Camera coordinate systems
Camerа coordinate system is related to device camera sensor orientation. Relative to a AVCaptureVideoOrientation.landscapeRight-oriented camera image, the x-axis points to the right, the y-axis points up, and the z-axis points out the front of the device (toward the user).

2. Tranform matrix

Transform matrix also known as camera pose, can be obtained via ARCamera property transform. It shows position and orientation of the camera in the world coordinate system. The trick is that the initial camera position is a landscape right which means that if the device is hold in portrait mode, transform matrix will have rotation component that correspond to that.

3. View matrix

View matrix or World matrix converts world space to camera space. It can be obtained by the following instance method

func viewMatrix(for orientation: UIInterfaceOrientation) -> simd_float4x4

In general view matrix for landscape right orientation is a reverse transform matrix and vice versa. For other orientations you need to compensate the rotation of the camera sensor that is hidden in transform matrix.

4. Intrinsic matrix

Nothing to add here, but a link to Apple official documentation, which has pretty detailed explanation.

Application

After LIDAR was added to some of iOS devices, it becomes possible to obtain not only depth data but much more accurate values of device telemetry. All this data allows to run 3D reconstruction. There are multiple available pipelines to use, but I was working with Open3D package to generate some 3D models like that: