The Intersection of Spatial Computing and AI

Published in

Sopmac AI

4 min readFeb 7, 2024

A few weeks with visionOS and a few days with Apple Vision Pro is starting to reveal the emerging intersection of AI and this spatial computing platform.

Here are some hands-on AI + AVP experiences that I’ve begun exploring:

Integrating generative AI assets into an immersive space

Here is an attempt to fuse together a RunwayML video with audio from Stable Audio with a 360 degree backdrop created in Midjourney and upscaled with Magnific AI and Pixelmator Pro.

(considering the addition of voice narration from Eleven Labs)

This demo simply replaced assets in Apple’s Destination Video sample app to reshape the immersive view and begin to scratch the surface with respect to curated AI experiences on the Apple Vision Pro.

Using AI to generate code that creates a Vision Pro app

The best place to start when trying to understand how to develop an app for visionOS is the official Apple Developer documentation. Either visit their website…or download Developer from the Vision Pro App Store and say “Hey Siri, open Developer” to get started. Here you will find articles, videos, code examples, and even downloadable demo projects.

Next, head over to GitHub as it has a few open-source projects that are worth running on your Xcode Simulator. Interrogate these working examples to accelerate your learning process.

Highly recommend starting with this excellent repo: https://github.com/satoshi0212/visionOS_30Days

visionOS Dev — Apple Vision Pro App Generator (built on ChatGPT)

Since we have public code and the developer frameworks were available before ChatGPT’s current data cutoff (April 2023), speed running the learning process is available via a custom GPT. Just provide your own instructions and supply your favorite Swift examples to serve as the knowledge base for the GPT. This is the rationale behind a GPT that I am feeding called “visionOS Dev”:

https://chat.openai.com/g/g-GbfBtRzZo-visionos-dev

Here, a developer with limited experience in Apple’s ecosystem can get simple projects generated based on their own requirements and then have a conversation with the GPT to fix defects, adjust features, ask what the code is doing, and quickly bring your imagination to spatial reality.

Custom GPT Use Case

Juno, the first app that I purchased with my eyeballs, is a great use case for the visionOS Dev GPT. After interacting with Juno, I wanted to see if this GPT could recreate a basic application shell that the developer, Christian Selig, described in his blog. After feeding his approach of using the YouTube embed API to the GPT, within minutes, an app for visionOS was running on my local.

Multi-tasking with custom Hacker News/YouTube apps and **visionOS Dev GPT**

After several conversation turns and incorporating some sample videos, my GPT pair programmer and I were able to generate a simple app without reviewing the YouTube API and having an extremely limited Swift background. Here’s the end result:

Playing YouTube videos based on code generated from the visionOS Dev GPT

Have also conversed my way into working code for:

API Calls (OpenAI, Hacker News, Yahoo! Finance, etc…)
Spatial Audio
Speech Synthesis
UI/UX Elements (e.g. Glowing Neon Effects, Dictate to Text with UISearchBar)
Web Sockets (for real-time BTC and ETH prices)

Apple Simulator views of an app talking to the Hacker News API and the other talking to the OpenAI API

Text-to-3D assets for use within a visionOS app

My most jaw-dropping experiences on Vision Pro have been those where I am standing in immersive space and presented with large scale 3D objects. How can AI help here?

Enter Luma Labs AI Genie. With this text-to-3D generator, you can prompt engineer your way to create USDZ models. Once you have these models, you can then converse some more with a custom GPT and have it create code to place these 3D models in your living room. You can also join the Luma AI Discord and download models from the community. The result is incredible — especially considering that this demo only took minutes from concept to reality.

Next Up: Potential Artificial Intelligence Intersections with Apple Vision Pro?

Integrating LLM calls (local or remote) into the logic layer of a spatial application
Augmenting the creative process for AI art generation
The release of an Apple LLM (accessible via an “LLMKit”) with multi-modal capabilities
Xcode Copilot
Voice-based navigation for spatial experiences
On the fly rendering of generative AI assets
Interacting with AI Agents in 3D space

Can’t forget to shoutout all of the spatialists that keep me up to date: https://twitter.com/i/lists/1749207474983354875

Now back to spatial reality.