Early Explorations of Generative AI in XR (2022)

Brief Overview of Prompt-Driven Techniques in Mixed Reality

4 min readJul 24, 2023

There are many emerging GPT-wrappers for 2D SaaS applications but our focus has been distinctively different, as we investigated the possibilities of leveraging this technology to enhance 3D creative capabilities. These explorations center on generative AI and how it can revolutionize game development and interactivity within Unity.

Given the close collaboration between Microsoft and OpenAI, this team took the initiative to do early GPT-3 Unity explorations.

Introduction

Unity is primarily designed for creating real-time interactive experiences, like games and simulations. Integrating external libraries and APIs, like GPT-3, within Unity’s native C# environment requires a different approach compared to integrating with a web app, which can often leverage readily available JavaScript libraries or REST APIs.

Games have dynamic and interactive states that can change rapidly based on player actions and events. GPT-3, being a language model, needs context and understanding of the state to provide meaningful responses. Keeping track of the game’s context and passing it to GPT-3 to generate appropriate responses can be more complex than in a static web app.

We used Roslyn for runtime code compilation in Unity. Roslyn is the open-source .NET compiler platform developed by Microsoft. It provides APIs that allow you to programmatically compile C# code at runtime and execute it within your application. Using Roslyn in Unity allows us to dynamically generate and execute C# code.

Here was a quick working example. Clippy GPT on the Meta Quest 2.

DALL·E x Unity Integration

(click here for the open-source repository)

November 3, 2022, the DALL·E API was released. To further research, I immediately made an integration into the Unity Editor. As stated on the repository, there are 4 main example scenes. The scenes save the generated images in a folder called img

Text to Image replicates the standard DALL·E interface and generates 4 images

Text to 3D Material generates images as materials within a 3D environment and applies 3 different materials to cubes

Outpainting uses a Unity-specific implementation of "outpainting"

Text-to-Skyboxapplies the result of outpainting to a Skybox

To create a generative skybox using outpainting, I used DALL·E’s outpainting capabilities to expand the boundaries of a given image, effectively generating a panoramic view. First, I provided DALL·E with an initial prompt as input, and let it extend the content beyond its original dimensions. Then, stitched together the subsequent outpainting outputs, creating a continuous panoramic view.

Auto-Scaling

Implementing auto-scaling allows users to interact with GPT-3 by asking for the size of an object, and based on the real-world size information provided, automatically scale the object within the game environment.

This potentially saves users time by automatically adjusting the object’s size rather than manually adjusting it through traditional methods. Additionally, it adds an element of interactivity and natural language interaction that can enhance the user experience.

Above is a working application that retrieves an object and adjusts an object’s dimensions based on a user’s query. In the scene there is a scale reference using the average height of a woman (5'4" or 1.625 m)

When the spawned object exceeded the game window size the object was saved in a folder called Resources and can be viewed in Scene View.

In conclusion, generative AI has proven to be a transformative in enhancing both gaming and AI applications. The diverse range of applications and prototypes showcased throughout this exploration has demonstrated the vast potential of this technology. From dynamically varied landscapes to improved natural language capabilities in chat agents and the seamless creation of textures for assets, generative AI has revolutionized the creative and interactive possibilities in these domains. Additionally, the implementation of AI-driven auto-scaling of objects has further heightened user experiences by providing a more adaptable and immersive environment.