The Spatial Computing Killer App is Finally Here

Why GPT-4o is important for AR/VR

Jack Yang

Published in

Antaeus AR

5 min readMay 14, 2024

About GPT-4o

This morning, OpenAI has just released GPT-4o, the latest upgrade to its renowned series of AI models. This new version introduces significant enhancements, bringing more advanced and versatile capabilities to users and developers.

There are a couple key Features of GPT-4o:

Enhanced Understanding: GPT-4o offers superior contextual awareness with minimal latency, making AI interactions more natural and coherent.
Multimodal Capabilities: Now supporting text, images, audio, and video, GPT-4o opens up a world of creative and practical applications.
Real-Time Collaboration: New tools for seamless human-AI collaboration boost productivity and innovation.

I will use the following article to explain my thought process on why this might finally be the killer app AR/VR needed.

Why this is important for AR/VR

A significant trend in recent AR/VR devices is the emphasis on mixed reality experiences. Most of the devices on the market use passthrough (e.g., Meta Quest, Apple Vision Pro, Pico), meaning the outside world is recorded and projected to the eyes. Some are see-through (e.g., Microsoft Hololens, Meta RayBan), meaning the user sees the world directly with virtual overlays.

Despite the different approaches, one common theme among these devices is the focus on world-facing cameras. Currently, these cameras are limited to object recognition such as hand tracking and world understanding. However, it is a new level of experience when these cameras can understand the context and respond accordingly in real-time. For example, a camera could look at your broken appliance and provide repair instructions; it could analyze your room and make decoration recommendations, and so much more.

The potential of these devices is vast. Imagine walking into a new city and your AR wearables instantly providing historical context, directions, and restaurant recommendations in real time. Or consider a medical professional using an AR headset to receive real-time guidance during a complex procedure, enhancing precision and outcomes.

There are already quite a few AI wearables on the market, such as the Rabbit R1 and the AI Pin, but they are not as positively received as one would hope. One of the biggest reasons is the limited FOV of these devices and the high latency of interactions. Most current AR/VR devices can make up for the limited FOV using the existing world-facing cameras. Combined with the low-latency interactions provided by GPT-4o, a world of opportunities truly opens up for developers and users.

How it could finally be the killer app

Combining multimodal AI with AR/VR won’t be an overnight success given the limited devices available and hardware limitations. Here’s a breakdown of some of the most popular AR/VR headsets on the market and how they can be utilized to take advantage of this new trend:

Apple Vision Pro: Rumors hint at collaboration between Apple and OpenAI. If true, combining the whopping 12 world-facing cameras with the multimodality of GPT-4o can provide the most cutting-edge experiences to users.
Meta Quest 3: Quest’s passthrough and world understanding have improved drastically over the past few years. Meta also has a powerful, open-sourced AI model called Llama 3. Combining these two will give mass consumers a sneak peek of what’s to come in the next 5–10 years.
Meta RayBan: This has by far the most potential to become the first AR/VR device to benefit from multimodal AI. Given its affordability and form factor, as well as close integration with Llama 3, I believe this will be the most popular AR device for mass consumers over the next 5 years.
Microsoft Hololens: Dynamic 365 Copilot in Hololens is a sneak peek of what’s to come.

How I think things will progress

As mentioned previously, the integration of contextual-aware AI with AR/VR won’t succeed overnight, but in several steps.

AI + Apps: This is the most natural starting point and where most people currently are. Given most people have a phone and are already using it to interact with AI through apps like Bing Copilot and ChatGPT. This stage involves users becoming comfortable with AI-assisted functionalities in everyday applications, enhancing their productivity and convenience.
AI + Phones: This is the next step as smart assistants in phones actually become smart (yes, I’m talking about you, Siri) thanks to iOS + OpenAI (rumored) or Android + Gemini integration. People will rely on the smart assistant more to complete day-to-day tasks and only rely on AI apps when there are more complicated instructions.
AI + Wearables: When people become comfortable with smart assistants, they will move on to wearables such as glasses and watches to take advantage of the multimodality aspect of AI. Meta RayBan would be a good preview of this step. These wearables will enhance personal and professional life by providing real-time data, augmented experiences, and seamless connectivity to other devices and services.
AI + Headsets + Enterprise: Enterprises will take advantage of the multimodality of AI and the computing power of headsets to help workers become more productive. For example, in manufacturing, workers could use AR headsets to receive step-by-step assembly instructions, reducing errors and training time.
AI + Headsets + Consumer: As headsets become better and cheaper, developers will find creative ways to make day-to-day experiences better and take advantage of generative AI to create experiences based on users’ context. This could range from personalized virtual fitness trainers to immersive gaming experiences that adapt in real-time to the user’s actions and environment.

Thank you for reading my article. Follow me on https://jackyangzzh.medium.com/ for more content like this!