ARKit: What It is and How It Changes Things

Published in

Mac O’Clock

6 min readMar 30, 2020

Being an iOS developer is a unique experience. With access to Apple’s consumer base, an iOS developer can potentially reach a billion users worldwide. Besides Apple’s valuation at $1 trillion, engineers have access to developer tools unlike other platforms. Because of tools like Swift and Xcode, iOS developers have the unique advantages of Apple’s device standardization, massive earning potential, higher quality assurance, code readability and improved performance. With these recent improvements to the open source community, developers are equipped to make apps on a scale never before seen. Not only is the amount of apps increasing but the quality of these apps are staggering. One feature that sticks out from Apple’s extensive product catalogue is Apple’s ARKit.

What Is ARKit?:

ARKit is Apple’s augmented reality platform built for iOS mobile devices. This framework allows developers to build AR experiences for iOS and iPadOS. By attaching virtual images to real world objects, the AR object “appears” in the physical environment. Introduced alongside iOS 11, ARKit delivers vivid AR content in the form of animated 3D virtual text, objects and characters. All of this is accomplished with the assistance of the iOS device’s built-in camera, internal processors, accelerometer and gyroscope. In turn, the environment is mapped out as the device moves. After identifying physical planes, users can interact with their own AR scenes in real time. Any tangible surface becomes a platform that is overlaid with augmented reality experiences. This ability to create personalized experiences is what separates ARKit as a worthy competitor in the augmented reality market.

Wall Detection:

Currently, ARKit only works on high-contrast, rugged backgrounds because of a lack of “rear-facing depth sensors” in the current hardware. Instead of smooth surfaces like granite, opt for a brick wall so patches of these surfaces can get picked up as the camera moves. Lucky for us, developers have the option of choosing between Apple’s Metal and SceneKit frameworks or third-party engines such as Unity and Unreal. With these at your disposal, virtual content can be rendered for the AR scene. ARKit tracks each anchor(fixed point of interest) [1] and [2] represents it as a node in Scene Kit’s object graph.

Ex. Plane Detection (1)

override func viewWillAppear(_ animated: Bool) {
	super.viewWillAppear(animated)
	startTracking()
}

private func startTracking() {
	sceneView.debugOptions = [ARSCNDebugOptions.showFeaturePoints]
	let configuration = ARWorldTrackingConfiguration()
	configuration.planeDetection = [.vertical, .horizontal]
	sceneView.session.run(configuration, options: [.resetTracking, .removeExistingAnchors])
}

ARSCNViewDelegate uses the renderer:nodeForAnchor:methods that we can implement to find out whenever an anchor/node pair has been added, updated, or removed from the scene. Let’s look at how we handle nodes being added to the scene with the renderer:nodeForAnchor:methods.

From here, ARAnchor is casted to ARPlaneAnchor. ARAnchor gives us a 4x4 transformation matrix that tells us the anchor location and orientation. ARPlaneAnchor then gives us the smallest rectangle that will enclose the plane. The plane anchor actually has a geometry property of type ARPlaneGeometry. ARPlaneGeometry gives us richer details, describing the smallest convex polygon that encloses the feature points that make up the plane. And to quickly visualize the plane, the ARSCNPlaneGeometry class is used.

We now have geometries in our scene corresponding to the surfaces that ARKit has detected. How do we use geometries to mask areas of the ARscene? The answer is the SCNColorMask type introduced with iOS 11. If we apply a material to a geometry that has an empty colour mask set for its colorBufferWriteMask, the geometry will write to the view’s depth buffer, masking other geometries, but won’t write any colours, allowing the underlying video stream to show through the scene. Meaning, shadows cast over the occlusion geometry will render, allowing virtual objects to throw shadows over real surfaces.

Ex. Adding Plane Nodes (2)

func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
	guard let plane = anchor as? ARPlaneAnchor,
		let device = renderer.device,
		let geometry = ARSCNPlaneGeometry(device: device)
		else { return nil }
	geometry.update(from: plane.geometry)
	let maskMaterial = SCNMaterial()
	maskMaterial.colorBufferWriteMask = []
	geometry.materials = [maskMaterial]
	let node = SCNNode(geometry: geometry)
	node.renderingOrder = -1
	return node
}

We also need to ensure that the masking geometry gets drawn before the objects we want masked. We achieve this by giving the occlusion nodes a negative renderingOrder.

Then, we use the ARPlaneGeometry object to update the ARSCNPlaneGeometry by calling geometry.update(from: plane.geometry). We continue to do this as ARKit gleans more information about the scene, dynamically updating the occlusion geometry. To do this, we use the renderer:didUpdateNode:forAnchor: method:

Ex. Updating Plane Nodes (3)

func renderer(_ renderer: SCNSceneRenderer, didUpdate node: SCNNode, for anchor: ARAnchor) {
	guard let plane = anchor as? ARPlaneAnchor,
		let geometry = node.geometry as? ARSCNPlaneGeometry
		else { return }
	geometry.update(from: plane.geometry)
	node.boundingBox = planeBoundary(extent: plane.extent)
}

Last but not least, when dynamically updating a geometry, SceneKit does not update the bounding volume for that geometry. Which causes it to occasionally blink in and out of visibility, because it is mistakenly being clipped out of the view. We can fix this by manually setting the bounding volume of the node to be equal to the extent of the plane, whenever the geometry is updated:

Ex. Updating Plane Boundary (4)

private func planeBoundary(extent: float3) -> (min: SCNVector3, max: SCNVector3) {
	let radius = extent * 0.5
	return (min: SCNVector3(-radius), max: SCNVector3(radius))
}

This is one example on how Wall Detection works in ARKit.

Combine Motion Sensing and Scene Analysis

To create the correspondence between real and virtual spaces, ARKit uses a technique called visual-inertial odometry. This process combines information from the iOS device’s motion sensing hardware with computer vision analysis of the scene visible to the device’s camera. ARKit recognizes notable features in the scene, tracks the differences in the positions of those features across video frames, and compares that information with motion sensing data. Resulting in a high-precision model of the device’s position and motion.

While world tracking can produce realistic AR experiences with accuracy and precision, it still relies on details of the device’s physical environment that sometimes inconsistent and can be difficult to measure in real time without a degree of error. To build high-quality AR experiences, be aware of these possibilities.

Design your AR experiences around predictable lighting conditions. World tracking involves image analysis, which requires clear imagery. Tracking quality is reduced when the camera can’t see details, such as when the camera is pointed at a blank wall or the scene is too dark.

Use tracking quality information to provide better user feedback. World tracking matches image analysis with device motion. ARKit develops a better understanding of the scene if the device is moving, even if the device moves only subtly. Excessive motion — too far, too fast, or shaking too vigorously — results in a blurred image or too much distance for tracking features between video frames, reducing tracking quality. The ARCamera class provides tracking state reason information, which you can use to develop UI that tells a user how to resolve low-quality tracking situations.

Allow time for plane detection to produce clear results, and disable plane detection when you have the results you need. Plane detection results vary over time — when a plane is first detected, its position and extent may be inaccurate. As the plane remains in the scene over time, ARKit refines its estimate of position and extent. When a large flat surface is in the scene, ARKit may continue changing the plane anchor’s position, extent, and transform after you’ve already used the plane to place content.

Conclusion:

The potential of AR cannot be understated. We are on the precipice of a major disruption in the mobile computing market. Because of these emerging platforms, major industries will look to AR to ‘augment’ how their business is conducted. Healthcare, education, marketing, journalism, real estate, gaming, music and the defense sectors are all industries likely to see a change in their operations. Mobile platforms are increasingly used as a means of bringing these ideas to all aspects of our lives. Potentially, consumers and brands alike can now apply select AR features to their applications according to their prerogative. Any context can be transformed with the AR toolkit, i.e., consumers are responsible for their own experience. The purpose being, to empower mobile users and in turn produce a virtual playground. Interacting between the real and virtual world will only get more intrinsically linked as AR further penetrates the market.

AugmentedReality IS THE FUTURE.

ARKit: What It is and How It Changes Things

Combine Motion Sensing and Scene Analysis

Written by Dominic Holder