I was definitely mind-blown about this year’s updates for Augmented Reality.

WWDC19: Why Apple is still at the top of the AR foodchain

Alberto Taiuti
Inborn Experience (UX in AR/VR)
9 min readJun 7, 2019

--

WWDC19 just finished and Apple didn’t go easy on the competition in the Augmented Reality space.

Updates to SceneKit, ARKit, CoreML, iOS 13… And these are only the ones relative to Augmented Reality. So many announcements were made that it can be daunting to process them all, especially if your field of expertise is not software engineering.

In this post I aim to get you up to speed with all that was announced during the opening keynote and avoid any fillers. I’ll give you meaningful summaries of the announcements and most importantly, fill you in with what each means in the grand scheme of things, gathered from my experience in the space: how each fits in Apple’s vision for Augmented Reality and what each means for the industry as a whole.

Without further ado, let’s get going.

Motion capture

ARKit3 introduces the ability to track the pose of a human body in real-time with just the back-facing camera on iPhones XR/XS/XSMax.

A quick test I did with 3D body pose tracking using YouTube

ARKit gives you what, in videogames and 3D programming, is called a “skeleton”, both in a 3D (what you see in the video) and in a screen-space (i.e. 2D) version.

The 3D version of the skeleton can be used with a 3D character model to animate it: this is very much the same technique used in videogames, with the (huge) difference that instead of it being controlled by pre-designed animations, it is driven by the movements of a person.

This enables a huge amount of new scenarios: from personalised full-body avatars/filters to games which use the different poses made by someone as input. How about jump rope?

People occlusion

Another huge addition is the ability for ARKit to now generate human body segmentation and estimate the depth of such people in screen-space. What this can be mainly used for is to correctly show AR 3D objects as being hidden by a person who is logically in front of the object in the 3D world.

This means that whereas in previous versions of ARKit we had this:

What a 3D object would look like if a person stood in front of it in previous ARKit versions

we now have:

Thanks to the new Machine Learning approach, content is rendered correctly as occluded

This addition works mostly towards improving the realism of the rendered AR scene, but the depth maps generated for this purpose can be used for other situations and applications.

For example one could use the segmentation map to apply screen-space effects only where people are not, effectively having a green screen effect ready to use anywhere at any time.

Tim Field has posted a great video showcasing the depth map generation for people occlusion and segmentation in real time, which I think does a great job at showing what’s going on in the background to achieve the occlusion effect:

Better image & object tracking

Apple has made various performance improvements to the existing image and 3D object tracking systems in ARKit.

For 3D objects the system is now more robust and should be able to detect them faster and with more accuracy. In previous ARKit versions accuracy was indeed an issue, to the point that we didn’t see many apps using this feature at all. Hopefully this will prove to be a good enough improvement that apps using this really interesting feature will start emerging.

For image tracking, ARKit can now track up to 100 different images at the same time, compared to just a few from the previous version. This is such a large number that probably you will run out of space in your camera feed before you can fit 100 images in one frame.

At the moment Apple are the only ones offering object and image recognition with such performance and accuracy.

Improved surface detection

Moreover, ARKit also introduces better and more precise surface detection and classification.

Surface detection on featureless surface has been improved via Machine Learning-based techniques

ARKit can now detect walls and other surfaces with more accuracy thanks to the introduction of Machine Learning-based surface detection. This means that, for example, the system does a much better job at detecting walls and surfaces which have few features, such as white walls inside a house.

This is particularly relevant for all those ARKit apps out there using AR for interior design, house decoration, surveying, and more. For example, positioning of home furniture can now be made with more precision and purpose.

I remember seeing a couple of startup working on letting users see what their walls would look like with a new paint colour through AR and I’m sure they’re very happy about this update.

Front & back cameras can now run ARKit at the same time

Whereas before you had to choose between running either a face tracking session or a world tracking session in ARKit, in ARKit3 you can now run them at the same time:

Apple showcasing both cameras running ARKit at the same time

This means that you can, for example, control what content is displayed and how it is displayed in the 3D world view using your face expressions as input. It’s a feature that many people had been waiting for and it’s great to see it being finally made available to developers.

RealityKit & Reality Composer

Apple also introduced a whole new set of APIs and Apps aimed at making Augmented Reality content creation faster and more accessible to non-technical users.

RealityKit

The first part of this offering is comprised of a new API for developer called RealityKit. RealityKit is a completely new system and API which is in many ways similar to Apple’s other 3D scene management API, SceneKit, but which doesn’t have anything to do with it. This might seem odd to many developers like me who have been working with SceneKit and were expecting major overhauls to it, but the main takeaway here is that Apple is trying to make it dead easy for anyone to start building AR apps.

Another thing to note is that RealityKit seems to be missing more advanced features such as particle effects and the ability to generate a mesh from a set of vertices programatically which are instead available in SceneKit, let alone custom shaders.

Although very promising on the surface, RealityKit seems to be missing some features that many core users will be disappointed to not find, making it unlikely that it will be adopted in more advanced apps.

Reality Composer

A standalone app, Reality Composer is an editor built on top of RealityKit which lets anyone with an iOS 13 or MacOS 10.15 create AR experiences and scenes. Think of Spark Studio or Lens Studio but from Apple.

In general it offers the basic functionalities you would expect in any modern 3D scene editor with the addition of being able to anchor your content to different types of AR anchors: face, surfaces etc.

The main implication of Reality Composer, which might go unnoticed, is its new file format’s integration with QuickLook.

QuickLook is now able to open RealityKit project files the same way it opens USDZ files and shows them to anyone with an iOS device without the need to install any apps!

I cannot overstate how huge this is: this way anyone with an iOS 13 (or MacOS 10.15) device can create complex, interactive AR 3D scenes and share them with anybody else without them having to install any additional app. This includes animations and interactions!

As support and features for RealityKit will surely improve in the next couple of years, expect this paradigm to become more and more prominent.

Better support for USD(Z) content creation

Although not introduced as part of the ARKit3 offerings, Apple did improve the toolset available for people working with USD content.

One of the main additions is the new ability to convert from glTF to USDZ models via the pre-built USDPython tools. Now it’s possible to convert the many, already available glTF models found on platforms such as Sketchfab to USDZ locally.

Another very requested feature was for developers to be able to export SceneKit .scn scenes to USDZ programatically from inside their apps, and this has been addressed with the release of iOS 13. In previous releases, although possible to do so, the exported USDZ file would be incomplete, whereas now the scene is correctly and fully exported. This enables a variety of content creation applications to take advantage of the improved feature.

I’ve written extensively about USDZ here if you’re interested in finding out more about the nitty-gritty details of the file format: https://medium.com/@alberto.taiuti/usd-z-deep-dive-for-arkit-part-one-11bcf24a3deb

Smaller apps & Faster app install times

At first glance you might think this section is unrelated to AR, but in my opinion it very much is.

AR experiences are ephemeral by nature, as they can be often tied to and triggered by a specific real-world location, and the friction of having to download a different app every time a new location is visited can create a lot of friction for users to actually buy and use your app.

For this reason, Apple having made apps smaller and faster to download will increase the chance users will want to download your location-based apps when they visit a new, specific location.

Although not the final solution we all want and need, which is a continuous-delivery of AR content platform, this will help with the adoption of AR amongst iOS users.

Indoor positioning via WiFi fingerprinting

Apple also announced that they are opening a new service for large-scale indoor mapping which makes use of WiFi fingerprinting to localise users.

The process to enable your business to use the Indoor Mapping system by Apple

WiFi fingerprinting navigation means that in order to find the location of your device in a room, the service triangulates the signals from various WiFi antennas to give you a location accurate up to 3/4 metres. If you’re interested in the usage details, you can find out more about the program in the relative WWDC19 video:

Needless to say, such accuracy is not enough to create persistent AR apps, but it can be used for coarse-grained filtering of some fine-grained relocalisation system, such as Image Anchors or ARWorldMaps: one could create pockets of ARWorldMaps (or Image Anchors) all over the place and then select which one to relocalise against using the WiFi positioning system (although I would argue that by this point you might as well just use Azure Spatial Anchors, which basically do all this for you and more).

The lighter blue radius of the blue dot represent the confidence of the location measurement

The main benefit of this localisation solution is that you don’t have to integrate 3rd party libraries, but not even Apple is marketing this as a solution for Indoor AR since they know that the precision is not (and will never be) enough for that with the bare WiFi positioning system.

Conclusions

These upgrades should convince you that Apple is currently at the forefront for public-facing AR offerings, and that it leaves the other tech giants in the dust (for now). Surely some of these techniques are not bleeding edge, but they are all relatively stable and a solid foundation upon which Apple will surely build in the upcoming years. We should expect to see object occlusion (either by screen-space segmentation or by meshing or by some other technique) at some point too in the future, and obviously enhancements to the other existing offerings, starting from RealityKit. WWDC20 couldn’t happen sooner!

References

If you enjoyed this post, make sure to leave some claps. 👏👏👏

You can clap up to 50 times, so get clicking/tapping!

Please share the post with your iOS designer / iOS developer friends on your social media outlet of choice.

Follow me on Twitter: twitter.com/albtaiuti

Want to hire me for iOS and ARKit work? Click here: albertotaiuti.com

--

--