Apple’s ARKit does a fantastic job of anchoring virtual objects in the real world. This illusion of a coherent augmented space collapses though if a real-world object passes in-between the device’s camera and the virtual object: the AR object remains fully visible onscreen with the real-world object appearing to pass beneath it, despite the real object being closer to the user.
Imagine an app that turns any surface into a whiteboard. What would happen if, as you moved through your augmented office, all your jotted-down notes continued to be visible, even if they were behind a wall, or in another room? You would quickly be faced with a confusing jumble of overlapping notes and drawings filling the visual field. How can we make these virtual notes and drawings disappear behind walls and other surfaces just like their real-life counterparts would?
To make it appear as if AR objects are passing behind real world ones, we need to be able to track the surface of those blocking objects, and use those surfaces to mask off sections of the virtual scene. We can call these masking objects “occlusion geometry”.
When ARKit 1.5 arrived in iOS 11.3 earlier this year, Apple gave us a couple of new features that make this easier. First, they expanded ARKit’s plane detection capabilities so that they can detect vertical planes in addition to horizontal ones, allowing us to detect walls. Second, ARKit 1.5 gives us more precise information about the shape of those planes. Whereas version 1 of ARKit represented planes as rectangles, ARKit 1.5 can represent planes as convex polygons, more closely tracking the shape of surfaces. We also get a new geometry primitive in SceneKit that can be dynamically updated as fresh information about the plane is found. A quick search of Twitter and StackOverflow shows lots of people experimenting with these techniques, so I was keen to see what was possible for myself.
Without any rear-facing depth sensors in current hardware, ARKit relies on there being discernible, relatively high contrast features in the scene that it can lock on to. If your workplace is all smoothly plastered walls, you’re flat out of luck. Point it at a patterned or textured surface such as tiles or exposed brick however, and it’ll start picking up patches of the surface quite quickly, expanding the area of the known surface as the camera pans around.
For rendering virtual content in the AR scene, there are several choices of engine. ARKit bridges to the third-party game engines Unity and Unreal, and also to Apple’s own Metal and SceneKit frameworks. If you wish to avoid third-party dependencies, or prototype rapidly, SceneKit is a great choice.
Xcode 9 provides a handy ARKit/ SceneKit template to kick things off. The first step is to enable plane detection in an ARKit world-tracking session:
What information do we then get back about the surfaces that ARKit detects? ARKit refers to fixed points of interest in the scene as anchors, and when bridging to SceneKit, each anchor can be represented as a node in SceneKit’s object graph.
ARSCNViewDelegate has a set of methods that we can implement to find out whenever an anchor/ node pair has been added, updated, or removed from the scene. Let’s look at how we handle nodes being added to the scene in the
Let’s unpack this method line-by-line. Because we’re only interested in planes, we cast
ARPlaneAnchor. New in ARKit 1.5, the plane anchor in turn has a
geometry property of type
ARPlaneGeometry. This progression gradually gives us more information about the plane.
ARAnchor gives us a 4x4 transformation matrix that tells us the anchor location and orientation.
ARPlaneAnchor then gives us the smallest rectangle that will enclose the plane.
ARPlaneGeometry gives us richer details, describing the smallest convex polygon that encloses the feature points that make up the plane.
One point of interest is that ARKit defines the plane’s orientation in such a way that the “up” axis of its transformation matrix is perpendicular to the plane (it is the plane’s “normal”). So if you’re wanting to hang a piece of virtual art on a wall using the plane’s transformation matrix, you’ll have to orient the artwork so that it’s laying on its back.
A quick coat of invisibility paint
To quickly visualise the surfaces that ARKit has detected, we can use the new
ARSCNPlaneGeometry class. In my testing I found that this class crashes unless it has a Metal device set (we’ll file a radar with Apple). We can retrieve the Metal device from the SceneKit renderer. Note that for this line to compile, the build target cannot be a simulator, as the simulator does not yet support Metal. To continue, change your build target to a real device.
We now have geometries in our scene corresponding to the surfaces that ARKit has detected. How do we then use these geometries to mask areas of our scene? The secret sauce is the
SCNColorMask type introduced with iOS 11. If we apply a material to a geometry that has an empty colour mask set for its
colorBufferWriteMask, the geometry will still write to the view’s depth buffer, masking other geometries, but won’t write any colours, allowing the underlying video stream to show through the scene. As an added bonus, any shadows cast over the occlusion geometry will still render, allowing virtual objects to throw shadows over real surfaces. For more on this technique, see the WWDC 2017 session on integrating SceneKit and ARKit.
We also need to ensure that the masking geometry gets drawn before the objects we want masked. The easiest way to do this is to give the occlusion nodes a negative
We can use the
ARPlaneGeometry object to update the
ARSCNPlaneGeometry mesh by calling
geometry.update(from: plane.geometry). We can continue to do this as ARKit gleans more information about the scene, dynamically updating our occlusion geometry. To do this we’ll use the
One final piece of housekeeping. In my testing I found that when dynamically updating a geometry, SceneKit does not update the bounding volume for that geometry. This causes it to occasionally blink in and out of visibility, because it is mistakenly being clipped from the view frustum. We can fix this by manually setting the bounding volume of the node to be equal to the extent of the plane, whenever the geometry gets updated:
For the full source code, please see this sample project.
Mapping the built environment
Returning to our augmented office proof-of-concept, what are the results like? To visualise how the surface detection works we added an outline around each of the detected planes. The colour of the outline is derived from the normal of the surface. The exposed brickwork of TAB’s HQ in the Spitfire building is the perfect canvas. ARKit detects the blocking wall pretty quickly and masks the scene accordingly. We shouldn’t overhype the abilities of current technology. Without rear-facing depth sensors, current hardware is reliant on relatively well lit, textured surfaces, and surface-detection is currently limited to planes that are parallel or perpendicular to the ground. For the foreseeable future, users are generally going to want to prevent real world objects from obstructing their view of AR scenes. But the potential here for building an intelligent awareness of the built space surrounding the user, and then richly augmenting that space, is already clear.