Apple’s Vision Pro is a Reality Already — an iOS Developer’s Perspective

How the various tools fit together

Anupam Chugh
Better Programming
5 min readFeb 7, 2023

--

Last updated: June 6, 2023. Changed the title from Reality Pro to “Vision Pro” (the headset is better than I thought).

After writing three predictions in as many years, I had sworn off discussing Apple’s rumored AR/VR headset in the hopes that it would get launched sooner. But the recent iOSDevWeekly newsletter by Dave and MG’s post about Reality Pro got me curious once again. With growing conversations about a potential launch at WWDC this year (still keeping my fingers crossed!), I decided to pause my prompt-based experiments and revisit this topic.

Apple has released a set of tools for AR apps in the past years that are intended to play a part in mixed-reality applications. Their toolkit includes RealityComposer for creating AR experiences, BackgroundAssets framework for downloading large assets, Live Text APIs which works in the PDFKit too, MapKit with its 3D experience, Face Painting using PencilKit, SharePlay, and many SwiftUI features including fine-grained controls like Gauge and the ImageRenderer API.

In the following sections, I will examine the current tools and pour my thoughts into how these tools could help solve the puzzle of a mixed-reality headset.

Hand Pose, Face Tracking, and Head Tilts

Apple’s Reality At Our Finger Tips, by MG was the post that got me thinking. It documents the various discussions of mixed reality and the mention of eye-tracking systems and hand gestures caught my eye:

If it’s truly a mixture of advanced eye-tracking paired with finger gestures, this could be a legitimate breakthrough in the space. It sounds simple and that’s the point.

Recently, Mark Lucking demonstrated a proof-of-concept of using CoreML to classify gazes and interact with components in an ARView (of course with face-tracking enabled).

Most people are already familiar with face tracking. Apple offers it through the Vision framework and ARKit and it can be configured using ARFaceConfiguration. For example, this Match the Moji case study by Cole Dennis recreates facial expressions from favorite emojis.

Although tracking gaze without face detection is not possible right now, the inclusion of extra sensors and cameras should do the job.

Source

Moving forward, head poses are supported directly through the orientation property of ARFaceAnchor. For example, Cole Dennis demonstrates this in his article “Use Head Tilt To Control a SwiftUI App Using ARKit / RealityKit”. While this currently requires looking at your phone screen, it is also possible to utilize the CMHeadphoneMotionManager to allow your AirPods’ orientation to play a role in user input. How cool it would be to make decisions just by nodding our heads.

Left gif from Cole’s article. Middle and right gifs by author.

The middle gif showcases the interaction with SwiftUI views through the CMHeadphoneMotionManager orientation changes, achieved by modifying a few lines in Cole’s code. Of course, the AR experiences were created in the RealityComposer app by integrating two USDZ models on the FaceAnchor.

There are many more ways to generate AR experiences, such as converting text to RealityKit Model Entities or placing subjects from an image on an anchor.

The third gif demonstrates one of my proofs-of-concept for swiping cards using finger tracking. So, typing in midair using hands is not new, as it was made possible by the Vision Hand Pose Estimation requests introduced in iOS 14. Apple gave us a glimpse of this feature during WWDC 2020. By bringing action classifiers like MLHandActionClassifier, which can be trained using CreateML on iOS, iPadOS, or macOS devices, we can customize environments to recognize hand and finger gestures accurately for virtual object interaction.

Siri-Driven SwiftUI?

Based on an article by 9to5Mac (that quotes from the Information), Apple has plans to introduce the ability for users to create and release AR apps for mixed reality using Siri beyond Reality Composer.

Now, from a distance, this might look like a bunch of advanced new tools, but the blueprint is already set.

For example, you can convert designs to SwiftUI views in a few ways. One’s by using the DetailsPro app (this could get Sherlocked by Apple this year).

Another way is by capturing screenshots of objects and converting them into SVG formats, then using a design-to-code tool to generate SwiftUI shapes. Artem has a helpful guide on this titled “Creating Complex Shapes in SwiftUI Using Design Tools”.

There’s also the SuperWidget app, which allows you to create home screen widgets from Shortcut apps without code.

While these are just a few no-code ways of building SwiftUI views, the prospect of Siri-powered AR apps can be a game-changer considering the shift towards the AI paradigm and the role that conversational programming is already playing in building software. The combination of Whisper and ChatGPT is already better than Siri, making it possible to generate class diagrams (read this) and code through voice commands.

However, building AR apps by saying what we want is a huge challenge with the current state of Siri’s technology. So, it remains to be seen how Apple improves.

I expect the use of SpeechKit with SwiftUI can bring a similar experience to that of Whisper + ChatGPT to create voice-driven SwiftUI views — considering Siri inherently uses SpeechKit.

A recent blog post titled “Server-driven SwiftUI apps” by Elyes DER shows an approach to build SwiftUI views using the Layout Protocol by defining object models that conform to class diagrams.

Of course, it will require much more work to function within a mixed-reality headset, and a native SwiftUI solution for RealityKit should be on the cards when the WWDC’s curtain unfolds (unless Apple delays its headset plans once again!)

2023 is being deemed the year of AI and voice-powered apps. I’ll make a bold guess and call this the year of Siri.

That ends my musings.

--

--

No responses yet