ARKit 911 — Scene Reconstruction with a LiDAR Scanner

Andy Jazz
Mac O’Clock
Published in
9 min readAug 17, 2020

Theory

A breakthrough LiDAR Scanner activates ARKit and RealityKit capabilities never possible before on Apple devices. LiDAR, which stands for Light Detection And Ranging, uses pulsed laser to send out pulses of light, and receiver to pick them up, for measuring a variable distances to surrounding objects up to 5.0 meters away (although in particular cases Apple LiDAR might have a coverage up to 7.0+ meters). LiDAR was conceived as a unit for building precise 3D maps. It operates at nano-second speed — from 0.2 to 5 nsec — that means there are hundreds of millions of pulses per second. With such an exceptional speed and dense point cloud’s coverage Scene Reconstruction operation is almost instant in ARKit and RealityKit.

LiDAR is right below a Flash

For LiDAR Apple has taken components from four main manufacturers (in 2020). These suppliers are Sony with its Near InfraRed CMOS Image Sensor (or, in other words, receptor), Lumentum with Vertical Cavity Surface Emitting Laser (or, if saying simpler, emitter), Texas Instruments with Wafer Level Chip Scale Packaging and Himax with Diffractive Optical Element.

Sony’s NIR CMOS image sensor has a resolution of 30,000 pixels.

It will not be superfluous to mention that Apple’s LiDAR is basically a direct Time-of-Flight (dToF) sensor. The main difference between direct Time-of-Flight and indirect Time-of-Flight image sensors is in that iToF sensor sends out continuous and modulated light (known as modulated light’s sine wave with a frequency of 20 to 100 MHz) and measures the phase of the reflected light for calculating a distance to an object, whereas dToF sensor sends out short pulses of light that last just a few nanoseconds and then measures the time it takes for some of the emitted light to come back.

The guts of a LiDAR Scanner

To say even shorter— iToF measures the phase shift, while dToF measures the direct time of flight. So dToF has a considerably higher accuracy than iToF. In recent years many Android devices were equipped with iToFs that can be useful in ARCore apps (you need at least ARCore 1.18 for implementing Full Depth API, and ARCore 1.24 for enabling Raw Depth API).
What about Apple devices? The latest iPhone models with A15 chip and iPad Pro models with M1 chip are the best choice for running ARKit 5.0 apps with scene reconstruction feature.

On March 28th, 2020 guys from iFixit shared a 5 min video called “What does the LiDAR scanner look like”. Watch this clip for a deeper understanding of what a LiDAR is.

Game-changing LiDAR, really?

Yep, it’s really game-changing unit. Here are some obvious pros and advantages what ARKit users and developers get when they use devices equipped with LiDAR:

  • Room’s lighting conditions are now important in a lesser degree*
  • It’s not obligatory to track just surfaces with well-perceptible textures
  • User doesn’t need to physically move for a successful tracking
  • LiDAR can work in conjunction with a plane detection feature
  • It doesn’t matter if there are pure white walls or white objects in a room
  • We get a mid-poly mesh with a ready-for-collision occlusion material
  • A higher speed of movement of the device at the moment of tracking
  • Improved Ray-Casting, People Occlusion and Motion Capture features
  • ARKit now considers surfaces with very little or even no feature points
  • Reconstructed surfaces are able to catch a real-world and virtual light
  • Each poly-face can be classified based on cases of ARMeshClassification
  • High quality Depth map running at 60 fps, coming from new Depth API
  • Near-instant and considerably more accurate plane detection feature
  • For better result people can be excluded from a Reconstructed Mesh

*You have to consider that scene reconstruction in a poorly lit room makes sense only for a mesh created for collisions or for improved plane detection. Saving ARWorldMap or saving mesh with projected textures on it (with the help of LiDAR or without it) is only possible when light conditions are good enough.

Coding with Swift

Scene Reconstruction feature is a part of Scene Understanding stage and it’s available when you’re running World Tracking configuration. At the moment the most robust approach for using a Scene Reconstruction feature in your app is to use it optionally, because there are just several Apple devices in a model row support it now. To check whether user’s device has a LiDAR Scanner on-board or not, you need to implement a regular guard-else statement. It can be done in AppDelegate.swift file.

@UIApplicationMain class AppDelegate: UIResponder, 
UIApplicationDelegate {

func application(_ application: UIApplication,
didFinishLaunchingWithOptions launchOptions:
[UIApplication.LaunchOptionsKey: Any]?) -> Bool {
let supportForSceneReconstruction = ARWorldTrackingConfiguration.supportsSceneReconstruction(.mesh) guard supportForSceneReconstruction else {
fatalError("Scene Reconstruction isn't supported here")
}
return true
}
}

After this you can write this simple code snippet in ViewController.swift inside viewDidLoad or inside viewWillAppear method.

import ARKit
import RealityKit
class ViewController: UIViewController { @IBOutlet var arView: ARView! override func viewDidLoad() {
super.viewDidLoad()
arView.automaticallyConfigureSession = false
let
config = ARWorldTrackingConfiguration()
config.sceneReconstruction = .meshWithClassification
arView.debugOptions.insert(.showSceneUnderstanding)
arView.session.run(config, options: [])
}
}

In RealityKit instance property called automaticallyConfigureSession has two states — when it’s true (default state) RealityKit disables classification because it isn’t required for occlusion and physics.

arView.automaticallyConfigureSession = true
config.sceneReconstruction = .mesh

When it’s false — you can generate a mesh of the real-world objects with classification for each face.

arView.automaticallyConfigureSession = false
config.sceneReconstruction = .meshWithClassification

Could you imagine – this is all that needed to be written in to make this functionality work? But it’s true!

As you can imagine, both frameworks, ARKit 5.0 and RealityKit 2.0, work in one harness. ARKit is responsible for world tracking and scene understanding stages (it helps to track a real-world environment, and then generate triangular mesh faces and corresponding ARMeshAnchors). RealityKit, in its turn, renders a resulting 3D mesh with an OcclusionMaterial applied.

If you need to reset a current session’s configuration and start tracking from scratch, employ the following code:

@IBAction func resetConfigurationButton(_ sender: Any) {    if let config = arView.session.configuration {        let opts: ARSession.RunOptions = [.resetTracking,
.removeExistingAnchors,
.resetSceneReconstruction]
arView.session.run(config, options: opts)
}
}
A room before Scene Reconstruction applied

If you wanna show or hide a reconstructed mesh follow this approach:

@IBAction func toggleShowHideMeshButton(_ sender: Any) {    let meshShowingOrNot = arView.debugOptions.contains(
.showSceneUnderstanding)
if meshShowingOrNot {
arView.debugOptions.remove(.showSceneUnderstanding)
sender.setTitle("Show Poly Mesh", for: [])
} else {
arView.debugOptions.insert(.showSceneUnderstanding)
sender.setTitle("Hide Poly Mesh", for: [])
}
}
A single-color mesh (without classification) Scene Reconstruction is enabled

If you need a color-coded mesh based on classification cases just copy-paste this simple snippet:

extension ARMeshClassification {var description: String {
switch self {
case .ceiling: return "Ceiling"
case .door: return "Door"
case .floor: return "Floor"
case .seat: return "Seat"
case .table: return "Table"
case .wall: return "Wall"
case .window: return "Window"
case .none: return "None"
@unknown default: return "Unknown"
}
}
var color: UIColor {
switch self {
case .ceiling: return .magenta
case .door: return .green
case .floor: return .cyan
case .seat: return .blue
case .table: return .yellow
case .wall: return .red
case .window: return .black
case .none: return .systemOrange
@unknown default: return .white
}
}
}
A color-coded (based on classification) Scene Reconstruction is enabled

To retrieve a non-nil collection of ARMeshAnchors from each frame apply a higher order function called compactMap:

guard let frame = arView.session.currentFrame else { return }var meshAnchors = frame.anchors.compactMap { $0 as? ARMeshAnchor }

Now you are capable of processing them:

for meshAnchor in meshAnchors {

print("This is \(meshAnchor.geometry.classification)")
}

I believe you realize that you can export a reconstructed mesh as a 3D model and then share it between other users.

Fine Tuning for Raycasting

When implementing raycasting and wanting to take a reconstructed mesh into account, you have to apply 2 non-obvious but important values for parameters called allowing and alignment.

@IBAction func tapped(_ sender: UITapGestureRecognizer) {    let tapLocation: CGPoint = sender.location(in: arView)
let
estimatedPlane: ARRaycastQuery.Target = .estimatedPlane
let alignment: ARRaycastQuery.TargetAlignment = .any
let result = arView.raycast(from: tapLocation,
allowing: estimatedPlane,
alignment: alignment)
guard let raycast: ARRaycastResult = result.first
else { return }
let anchor = AnchorEntity(world: raycast.worldTransform)
anchor.addChild(model)
arView.scene.anchors.append(anchor)

print(raycast.worldTransform.columns.3)
}

According to Apple documentation:

ARRaycastQuery.Target.estimatedPlane is a raycast target that specifies nonplanar surfaces, or planes about which ARKit can only estimate. A raycast with this target intersects feature points around the ray that ARKit estimates may be a real-world surface.

ARRaycastQuery.TargetAlignment.any is a case that indicates a target may be aligned in any way with respect to gravity.

Summarizing

Let’s summarize all the advantages of using devices with a LiDAR Scanner, and put them all in one pivot table:

|------------------------------|---------------|---------------|
| Feature | with LiDAR | without LiDAR |
|------------------------------|---------------|---------------|
| Need to move when tracking | No | Yes |
|------------------------------|---------------|---------------|
| World lighting is important | No | Yes |
|------------------------------|---------------|---------------|
| Discernible world textures | No | Yes |
|------------------------------|---------------|---------------|
| Ready-to-be-lit R/W surfaces | Yes* | No |
|------------------------------|---------------|---------------|
| Tracks white walls | Yes | No |
|------------------------------|---------------|---------------|
| Improved People Occlusion | Yes | No |
|------------------------------|---------------|---------------|
| High-quality Depth channel | Yes | No |
|------------------------------|---------------|---------------|
| Ready-to-use poly mesh | Yes | No |
|------------------------------|---------------|---------------|
| Near-instant Detected Plane | Yes | No |
|------------------------------|---------------|---------------|

*Real-world surfaces can be lit with virtual lights only if Apple Engineers implement texture capturing in the Scene Reconstruction feature.

Disadvantages

At first we need discuss a quality and resolution of a reconstructed mesh. Even considering all the advantages of the iPad Pro and iPhone Pro with a LiDAR scanner, it should be noted that the quality of the scanned environment is far from ideal. But this isn’t surprising, because a compact LiDAR in a $1K device is unlikely to be of a comparable quality to full-sized models like Velodyne, Ouster, or DJI Co. LiDARs costing several thousands of dollars.

For example, Velodyne Puck LiDAR, is able to measure a variable distances to surrounding objects up to 100 meters away, with accuracy of ±3 cm, with a help of 600,000 data points per second, and with a horizontal Field of View 360 degrees.

Or, for example, DJI Zenmuse L1, integrating a Livox Lidar module and a high-accuracy IMU, is able to measure a variable distances to surrounding objects up to 450 meters away, with a help of up to 480,000 data points per second. Watch a short video about it.

iOS device, in its turn, scans the environment using significantly fewer lighting points than “big brothers”, so it’s not always capable of reconstructing thin objects like floor lamp legs or computer cable. And at the moment, when scanning the sculpture with Apple device, it is almost impossible to get high-quality facial expression.

The second issue is a dilated mask with soft edges around occluding real world objects. It’s a pity but we can’t erode that mask or make its edges hard.

The third trouble is difficultу of using SceneKit’s ARSCNView or SceneView (or more precisely, some restrictions of using it) and its tools. For Scene Reconstruction feature we must use RealityKit’s iOS ARView or visionOS RealityView.

The fourth — a quality of a scanned object’s mesh depends on a distance to objects. However RealityKit dynamically updates a mesh of a scanned object, so there is a high probability that the mesh quality will be reduced. In other words we have a dynamic tessellation in scene reconstruction. That’s because RealityKit tries to use hardware resources sparingly.

And the last disappointing is — a reconstructed mesh has no real world textures assigned to it. Only RealityKit’s OcclusionMaterial is automatically assigned to a mesh.

However, as a developer, I hope that in the near future Apple engineers will append some additional functionality to Scene Reconstruction feature and fix all the existing bugs.

Final conclusion

Despite its many limitations, the Apple LiDAR scanner is an amazing sensor that is gracefully integrated into the AR ecosystem. In 2024, everyone expects Apple Vision Pro release, which will feature more advanced LiDAR and will push the boundaries of AR wider.

RGB image (left) and Dense Point Cloud (right)
Generated Point Cloud with ARKit and iPad Pro

That’s all for now.

If this post is useful for you, please press the Clap button and hold it.

¡Hasta la vista!

--

--