What ARKit 3.5 and the new iPad Pro bring to the table and how you can export it’s LIDAR scans

Stefan Pfeifer
zeitraum
Published in
9 min readApr 20, 2020

When the iPad Pro (4th Generation) was unveiled by Apple on March 18th 2020 we at zeit:raum were intrigued by it’s new LIDAR sensor and the possibilities it would open up. Now all that was missing, was a new version of ARKit, we could play around with and explore. That followed a week later on March 24th. So the game was on, we ground our teeth into it and here we present our thaughts on what the new AR hard- and software update from Apple has to offer. As a treat, we even show you, how to export the new scene geometry, generated by ARKit, to an OBJ file, which you can import into any other 3D authoring software.

What it is

The new version 3.5 of ARKit is all about the new LIDAR sensor built into the iPad Pro (and probably future iPhones). It works like this: At the start of any AR session ARKit begins to generate a 3D mesh out of the LIDARs depth information with a radius of up to 5 meters (about 16 feet for the metrically impaired 😉). This mesh will be continuously updated and even improved by “classical” plane detection as you move and can be used in numerous ways in AR. Here are some of them:

Physics

For one thing you can use it as physics collision mesh. You can throw a digital tennis ball into your scene and it will bounce off of objects realistically. Another way to leverage it for collision is to raycast and hit detect against it. This way you can measure distances to real world objects, which are not necessarily planar.

iPad Pro 2020 LIDAR Review by MobileReviewsEh at time code 4:09 — Measure App

Object Occlusion

Thats a big one. The LIDAR with ARKit 3.5 makes complete world occlusion possible, not only people occlusion like before with ARKit 3. Now you can place digital objects and then move around with your device behind other real world objects and observe that your digitals get occluded by them.

AR Occlusion comparison by FavoriteTech

Of course there are limits to that. Your occluding real world objects can’t be to small for the LIDAR to pick them up or move so fast that the mesh generation can’t follow up quickly enough. To account for the relatively low resolution of the LIDAR sensor (more on that later) objects occlude an area bigger than they really are with a blurry gradient.

“iPad Pro 2020 LiDAR Camera testing Dynamic object occlusion” by FavoriteTech

Face Classification

Like with ARKits plane classification before, Apple introduces a machine learning model, that classifies your surrounding world to offer a semantic meaning for objects in the AR scene. Though this time on the level of each individual face of the scene geometry mesh. In computer graphics a face is a surface that is made up by the connections (edges) of at least three (and in most cases exactly three) vertices (coordinates in 3D space). Those faces combined make up a 3D mesh. That out of the way, ARKit 3.5 tries to assign such a classification to each individual face in the scene geometry, which developers can query and work with. All available classifications are:
ceiling, door, floor, none, seat, table, wall & window

Test Video by Takashi Yoshinaga of Apples Sample App Visualising and Interacting with a Reconstructed Scene

Backwards Compatibility?

Even existing AR apps already profit from the ARKit update on the new iPad Pro without any changes to their code. Finding planes in world tracking mode is sped up considerably to the point where you almost never have to goofily wave around your device anymore:

Plane Classification Scene of Unitys ARFoundation-Samples

How to get that

Let it be said, that all of those new shiny AR features require you to use RealityKits ARView, there is no going back to SceneKit and the ARSCNView. If you want to know how to implement them, have a look at Apples sample app Visualising and Interacting with a Reconstructed Scene. It’s a great example, that covers every new feature in a nicely commented way.

What it is NOT

Apple marketed the new LIDAR sensor and ARKit 3.5 as vast improvement for handheld AR; and nobody can deny that it is, thats for sure. But there are limits to it, some technical and some probably tactical. Based on the expectations you had, when Apple announced, it would integrate a 3D scanning capable sensor into the back facing camera of it’s flagship devices, you might be completely satisfied with what it ended up being, or… slightly disappointed?

Raw Data?

One bummer for some of you could be, that there is no access to the raw depth data delivered by the LIDAR. No depth image or point clouds developers could play around with and feed into their own mesh reconstruction or object recognition algorithms. A reason for that could be, that Apple does not deem the data to be of high enough quality for that. Compare the resolution of the LIDAR sensor to that of the FaceID camera

From “12.9” iPad Pro 2020 Teardown: What does the LiDAR scanner look like?” by iFixit
From “12.9” iPad Pro 2020 Teardown: What does the LiDAR scanner look like?” by iFixit

As you can see, the LIDAR rays are spaced relatively far apart and therefore result in a farely low depth resolution. Apple might just not want you to play with that directly. You can’t criticise them for something, you can’t access.

3D Mesh Reconstruction

The way 3D mesh reconstruction is implemented at the moment is targeted at it’s current purpose only — enhancing the AR experience. It works like this: The scene geometry is not reconstructed out of the depth data as a whole, but rather assambled through the construction of several smaller submeshes, saved in ARMeshAnchor objects. The mesh data of those objects gets updated over time to reflect changes in the real world. Additionally, objects you’re close to get tesselated finer than objects you’re further away from. Both behaviours in turn mean, that scene regions your device is not facing anymore, get less resolved or completely deconstructed over time. This probably happens to save on memory and performance impact.

iPad Pro 2020 LIDAR scan by zeit:raum — You can see the back faces of the statue get deconstructed

This is fine for the intended use in AR scene understanding, but not for anything more. And therein lies the next bummer for “power users”. ARKit 3.5 does not provide a configurable API to purposefully scan and export a complete 3D mesh or an option to modify the scene reconstruction process in any way. For instance you can’t set a fixed tessellation level nor are any smoothing or hole filling algorithms provided. It’s no replacement for a structure sensor just yet, which some might hoped (we did 😅)

How to export a LIDAR scan

However, you can get a snapshot of the vertices, faces and normals of a single frame. With this data it is possible to export an OBJ file. And we want to show you how.

iPad Pro 2020 LIDAR scan by zeit:raum — Berlin Charm

Our Approach

So we get these raw mesh data like vertices and faces. What we need is a way to convert them into a 3D model file format which is exportable from iOS and importable in other 3D authoring software. This is where Apples Model I/O framework comes into play. It is capable to assemble a model file out of our data, if we feed it in the correct way, which can be a bit tricky.

As a base to work on top of, we can use Apples sample app for ARKit 3.5 Visualising and Interacting with a Reconstructed Scene. For our purposes some of the features in there can be deleted, like the face classification, raycasting and spawning of 3D texts, we won’t need them here. In the main storyboard file just add another button called “save” and hook up a method head for it in ViewController.swift. For reasons of simplicity and demonstration everything will happen in this method and there won’t be much consideration for executing things asynchronously.

Adding a save button

Let’s dive into it

First of, we need to import the Model I/O and MetalKit frameworks at the beginning of the ViewController.swift file.

At the start of our save method, we will have to get the current ARFrame, which will give us access to all ARAnchors currently placed in the scene. Of those, we only need anchors with meshes of the scene reconstruction in them. So we filter them to get only the ones which also are ARMeshAnchor objects.

With Model I/O framework methods as our way to go, we will need to assemble a MDLAsset object, which in the end will contain the complete scene and can be exported to a file. To create one we’ll need to initialise it with an instance of an MDLMeshBufferAllocator implementation to throw memory around. As such we choose MTKMeshbufferAllocator, because RealityKit uses Metal as Renderer. That in turn needs to be initialised with a MTLDevice instance, which is basically a reference to the graphics processor of your device.

All of the next steps, have to be done for every ARMeshAnchor we gathered earlier. We will need to convert the position of every vertex in its mesh from local space to world space and write the result back into it’s position in the vertex buffer.

That is because, the position data of the vertices is stored relatively to the transform of its anchor and OBJ files have no concept of a scene hierarchy. The export method of the MDLAsset doesn’t convert it for us, so we have to do it manually. Otherwise every mesh anchors vertex positions would start from the world origin after the export and we would end up with something like this:

Scan Export without converting vertex position data from local to world space

We now have everything prepared to actually convert our scene geometry into a format which can be exported. We start by using the mesh buffer allocator to Create MDLMeshBuffers out of the vertex and index MTLBuffers which we can then use to feed the mesh creation process. Next we initialise a MDLSubMesh with the index buffer and describe the type of faces we have, in our case triangles. So the sub mesh knows, every 3 indices in the index buffer describe one face. We also need to create a MDLVertexDescriptor which we feed information about the memory layout of the vertex buffer. With all of that in place, we finally can build the Model I/O representation of our mesh, a MDLMesh. We add it to the MDLAsset and repeat the whole process for the other ARMeshAnchors.

Finally, when the MDLAsset is fed with all meshes of the scene geometry, we can export it to an OBJ file and share it via an UIActivityViewController.

Where to go from here

You’re now able to export an OBJ file of the scene geometry within a single frame. If you want to try it out, just grab a copy on our GitHub repository.

It is a good base to build and further experiment on. For example you could also export a STL file or have a go on what happens, if you gather the meshes of mesh anchors over several frames and export them all into one OBJ (or export several single frame OBJs) and create a pipeline to merge and remesh them in a 3D authoring software. That might be a way to circumvent the behaviour of ARKit to deconstruct the side of meshes you’re not facing anymore. Another idea can be to create a VR experience with 360° images and underlaying LIDAR scans as collision meshes, you can play around with and seemingly interact with the real world. And of course, if you manage to export a model with textures, let us know. 😉

iPad Pro 2020 LIDAR scan by zeit:raum

What does the Future bring?

In its current implementation ARKit 3.5 is a huge boon for Apples augmented reality platform. Older AR apps will benefit from the faster plane detection and the original features it brings will pick up more steam when Unity releases an integration for their augmented reality SDK ARFoundation. Where it really could shine though, is when integrated in Unitys future AR development platform MARS. There, ARKits semantic scene understanding capabilities would bring great benefits for context aware automated placement of AR objects. Undoubtedly the next iPhone generation will also be fitted out with a LIDAR sensor. That will broaden the audience for the new ARKit features and will pave the way, for Apples future augmented reality platforms to come (hint hint 😉). If you look closely, the feature set of the LIDAR sensor integration with its current software implementation has a lot in common with head mounted AR headsets like Microsofts HoloLens and the Magic Leap 1. Combined with all signs already out there, pointing in the direction of an Apple AR headset, it seems quite clear to me, where this journey will take us.

--

--