Generative XR — Space aware AI bringing new era of immersion

Article on ‘Generative XR’ got featured in XROM’s Expert Insights

Kuldeep Singh
XRPractices
5 min readJun 28, 2024

--

Artificial Intelligence has made significant strides with the development of Large Language Models (LLMs) or Generative AI, which are trained to understand and respond to human language naturally. Tools like OpenAI’s ChatGPT, Google’s Gemini, and Microsoft’s Co-pilot have become the go-to resources for “Ask me anything” queries. These innovations have been touted as potential Google Search Killer, disrupting the way we search for information, communicate with machines, and receive machine responses. While AI has become ubiquitous and inevitable in most of the technology domains, however the spotlight on Generative AI has led many to believe that technologies like XR (AR, VR, Metaverse, IoT, Blockchain) have taken a backseat or even become obsolete.

In this article, we will explore how AI is evolving beyond Generative AI to encompass Generative XR, leveraging real-world data to create a new era of immersive experiences faster than anticipated.

First, let’s delve into what XR entails and then examine the interplay between AI and XR.

Extended Reality (XR)

Extended Reality (XR) is a dynamic fusion of technologies that redefine our reality through artificial means, offering innovative digital experiences:

  • Augmented Reality (AR): Through devices like smartphones, smart glasses, and AR headsets, AR enhances our real-world environment by overlaying digital content such as information, graphics, or animations onto physical surroundings.
  • Virtual Reality (VR): VR immerses users in entirely virtual environments using VR headsets. These devices create a fully simulated reality, transporting users to places and scenarios that are entirely computer-generated.
  • Mixed Reality (MR): MR merges the virtual and physical worlds by integrating digital content into the real environment. MR Devices enable users to interact with holograms and digital objects within their physical space.

XR also encompasses advancements such as smart mirrors and 3D projection displays. Beyond visuals, XR offers multisensory interactions with devices like smart gloves, bodysuits, and wearables, paving the way for a new era of spatial computing.

XR continues to evolve with the seamless integration of AI within all aspects of XR — devices, services, operating systems, and more. Let’s explore this further in the following sections.

AI powering XR

AI has evolved to provide insights previously inaccessible and has the potential to outperform humans in specific tasks. AI powers XR solutions by extending human senses such as vision, hearing, touch, and feel, enabling the brain to perceive the artificial as real.

As I explain in my book “Exploring the Metaverse: Redefining the Reality in the Digital Age”, AI powers XR in multiple ways:

  • Environment and Motion Tracking: XR immersive experiences require 6DOF tracking and movement, and for that devices are equipped with numerous sensors, computing capabilities, computer vision algorithms, and AI-backed estimation and learning. Simultaneous Localization and Mapping (SLAM) algorithms, enable the generation of a virtual representation of the environment and its ongoing tracking, and AI further detects objects, images, and plane surfaces. It involves cameras, accelerometers, gyroscopes, magnetometers and lidar. AI can now estimate depth just from camera feed without lidar even data, allowing for lighter and more natural XR devices. Augmenting virtual objects in the real environment, considering occlusion, collision, and gravity, relies on AI and physics engines.
  • Face and Body Tracking: Digital identities, such as avatars or digital replicas, are becoming integral to immersive environments. AI powers these by enabling face detection, expression recognition, and body tracking. It senses body expressions and hand movements, making interactions in XR more natural and immersive.
  • User Interactions: XR interactions now incorporate multiple input and output modalities, including head movement, eye tracking, hand tracking, gesture detection, voice inputs, and even brain inputs via thought. AI is key to enabling these advanced interaction methods.
  • Digital Twins and Simulation: Building simulations and digital twins of real environments and humans is an important use case for training and education in XR. Navigating within a simulated environment requires dynamic actions such as walking, running, and driving, and needs randomization to appear realistic. AI can bring this dynamicity, which we’ll discuss further in the next section on scene understanding.

AI advancements continue to empower XR, enhancing its capabilities. It’s important for businesses to recognize that AI does not diminish the importance of XR but instead enhances it. In the next section, we’ll explore how the hype around Generative AI can further propel XR technology.

Generative XR

Generative AI has already revolutionized text content generation and is now extending its impact to other forms of content. While XR is already used across various industries, its expansion relies on generating relevant, dynamic content. This intersection of XR and Generative AI, which I call Generative XR, is key to creating immersive, realistic experiences.

  • AI for XR Content Generation: To make XR a reality, we need data at scale. This can be achieved by replicating physical environments and creating digital twins of the real world. Tools like RoomPlan, RealityScan, photogrammetry scans, hardware-assisted 3D scans like Matterport have been successful in this area. Generative AI can introduce dynamism into XR environments through natural language prompting. Not just for gaming and entertainment, AI can well generate architectural models with concepts like ArchiGAN, can further create detailed representations of rooms, large buildings, roads, and adaptive floor plans, even for fictional cities and malls.
  • Deep Training and Testing on 3D Simulation: XR-based simulations, such as those offered by Carla and NVIDIA’s Isaac Sim, make autonomous vehicle and robotics testing and training more affordable and effective. Machine learning (ML) models need detailed structural data for training, which is facilitated by tools like PointNet, VoxelNet, and ShapeNet. PointNet classifies and segments 3D point clouds, VoxelNet divides point clouds into 3D voxels, and ShapeNet offers a vast library of annotated 3D shapes for training ML models.
  • AI-assisted 3D Modeling:Integrating AI into the XR content pipeline involves handling various inputs (like sketches and photographs) and using advanced processing capabilities (driven by edge, 5G, and cloud technologies) to generate content. This content can be reviewed and optimized by humans or AI systems before final approval and publication. For instance, Convrse.AI focuses on AI-based content optimization.

For XR and the metaverse to become truly mainstream and deliver significant business value, AI-generated content will play a vital role. As we continue to evolve, we will encounter new opportunities and possibilities. It is up to us in the industry to seize these opportunities and create extraordinary experiences.

While generative AI is addressing some of the key content challenges, there remains a level of skepticism regarding the trustworthiness of GenAI data, even as it begins to reveal more accurate insights. However, when XR becomes the interface for AI, this trust can be significantly enhanced. For instance, there is a profound difference between discussing medical treatment through a textual bot interface versus interacting with a digital doctor’s avatar in a familiar, comfortable environment. Recent advancements, such as OpenAI’s GPT-4 and Google’s Project Astra, are pioneering spatially aware AI, with XR playing a crucial role in establishing trust in these technologies. Similar development is seen in the announcement by Microsoft in bringing volumetric apps to Meta Quest

Spatial AI — GPT4o

As we move forward, the synergy between AI and XR will unlock new dimensions of immersive and intuitive experiences. Businesses and individuals alike must embrace these advancements to fully realize their potential. By leveraging this Generative XR, we stand on the cusp of a new era where the boundaries between the digital and physical worlds continue to blur, creating unprecedented opportunities for innovation and connection.

This article is originally published at XROM and thinkuldeep.com, being republished here with minor updates.

--

--

Kuldeep Singh
XRPractices

Engineering Director and Head of XR Practice @ ThoughtWorks India.