ChatGPT and Gen AI for 3D Content Creation

Tips, Tricks, and Common Questions

NVIDIA Omniverse
8 min readAug 4, 2023

By: Alex Qi, Mario Viviani, and Paul Cutsinger

Demand for 3D worlds and virtual environments is skyrocketing across industries, driving the need for streamlined workflows and creative solutions. However, the tedious and time-intensive nature of 3D design can pose challenges for artists and designers.

Generative AI tools for virtual worlds have emerged as game-changers, liberating creators from repetitive tasks and enabling them to focus on unleashing their creative potential.

Like many curious developers, members of the NVIDIA Omniverse team experimented with ChatGPT and the new GPT-4 large multimodal model to demonstrate the ease with which custom tools can be developed to rapidly generate 3D objects for virtual worlds. Combining GPT-4 with Omniverse DeepSearch, an intelligent AI librarian capable of searching vast databases of untagged 3D assets, they created an extension that allows developers and artists to retrieve 3D objects effortlessly using simple, text-based prompts and seamlessly incorporate them into their 3D scenes.

The extension, which is called AI Room Generator, showcases the immense potential of generative AI in populating realistic environments. With a few text prompts, high-fidelity objects are automatically generated and placed, saving hours of painstaking work. These objects, based on Universal Scene Description (OpenUSD) SimReady assets, are physically accurate and behave realistically in any simulation.

Recently, NVIDIA hosted an AMA event for developers to ask questions about different ways that ChatGPT, generative AI, and Omniverse can be leveraged to build tools for the metaverse.

Questions were asked about different types of generative AI models that can be used, specific use cases for ChatGPT, and tips for adding functionalities to generative AI tools in Omniverse.

Connecting Generative AI Models

Q: Can we mix LLMs with other models like Stable Diffusion (SD) for Gen AI?

LLMs can be used in various ways as a preprocess pipeline to connect to SD. For example, you might be able to ask LLM to stylize a prompt based on plain English, then it can be passed to SD. Technically, it’s all about connecting APIs together.

View in Forum

Q: If I don’t want to use ChatGPT, do I have any other options?

Omniverse is an open platform where other LLMs can be integrated. One such alternative is NVIDIA NeMo, which allows you to train on your own data and get responses that speak in your brand’s language and reflect your specific domain.

View in Forum

ChatGPT Metaverse Use Cases and Possibilities

Q: Would it be possible to create entire environments based on GPT prompts within Omniverse Replicator? The idea is to fine-tune computer vision models on synthetic datasets (models + environment).

You could certainly do that. You can even have GPT create USD directly rather than json files. However, creating meshes and textures from scratch requires long GPT responses that can use up your GPT tokens quickly.

A different approach would be to compose your scene, keeping in mind those things you would like to randomize. Then, ask GPT to give you random values for only those key items. That way, you’d be able to work with a much larger scene. That said, this is what Omniverse Replicator already does and I might use Replicator for this specific task over GPT.

View in Forum

Q: I would like to create an AI agent operating from visual feedback using python libraries such as YOLO or SegmentAnything. Is it possible to feed camera coordinates to a GPT prompt and then update the position based on the reply from GPT, similar to the 3D object placing pipeline?

You can incorporate pretty much any python package you want into your extension, the Omniverse platform is extremely flexible. You could definitely tap into one of the scene’s update callbacks, get camera coordinates, and feed those into GPT. Just be ready for a really, really slow frame rate because it takes a few seconds to get that reply back from GPT.

View in Forum

Q: Can we use NVIDIA Omniverse to build human-like Virtual 3D Assistants that talk using ChatGPT API as custom agent?

You can definitely build your own solution in NVIDIA Omniverse to animate virtual assistants, using technologies like Audio2Face in combination with speech synthesis tech like NVIDIA Riva and combine that with GPT-4.

There are also 3rd party solutions in Omniverse that allow you to easily create virtual assistants using generative AI like Convai. They have recently released an Extension for Omniverse that allows you to easily create your characters on their website and then interact with them in Omniverse.

View in Forum

Tips for Developing Generative Tools in Omniverse

Q: When creating 3D environments using prompts, are there strict constraints for properties such as the scale of an obj file? Or can we ask the prompt to use a specific obj file? How much control do we have on the domain?

There are two places in the process where you can add constraints. The first is in your prompt engineering and the second is in your post-processing of the prompt results.

When you craft your prompt, you could give GPT a list of acceptable responses and then ask it to only use those responses. You can see an example of this in prompts.py starting on line 32 where it says:

“For each object you need to store:

  • object_name: name of the object
  • X: coordinate of the object on X axis
  • Y: coordinate of the object on Y axis
  • Z: coordinate of the object on Z axis
  • Length: dimension in cm of the object on X axis
  • Width: dimension in cm of the object on Y axis
  • Height: dimension in cm of the object on Z axis
  • Material: a reasonable material of the object using an exact name from the following list: Plywood, Leather_Brown, Leather_Pumpkin, Leather_Black, Aluminum_Cast, Birch, Beadboard, Cardboard, Cloth_Black, Cloth_Gray, Concrete_Polished, Glazed_Glass, CorrugatedMetal, Cork, Linen_Beige, Linen_Blue, Linen_White, Mahogany, MDF, Oak, Plastic_ABS, Steel_Carbon, Steel_Stainless, Veneer_OU_Walnut, Veneer_UX_Walnut_Cherry, Veneer_Z5_Maple.”

You can use this type of pattern to apply other constraints to any of those properties.

When you get the results back from GPT you can enforce any other constraints you might want. The limit here is only what you can code in python.

View in Forum

Q: I was trying out ChatGPT to generate scene assembly based on the NVIDIA sample, but it would often get the height wrong. Are there best practices or a better way to generate a scene description so the geometry of objects put into a scene are taken into account?

There are a few ways to achieve this that include prompt engineering, result control and pre-object disposition checks. In the AI Room Generator example, we just take the XYZ position of objects and don’t provide specific instructions on how the Y axis should behave. However, you can be more strict in your prompt engineering about it. You could try adding specific behavior requirements like “If you are placing a tabletop object or an object that usually is on a wall or on top of furniture, make sure the Y axis value is not 0”.

You can also provide a few examples of desired results. For example, provide an example where a TV is placed on top of a TV stand. Once you receive the final results, you can also run your check to make sure that the desired objects are correctly placed with the right offsets ahead of placing them in the scene.

View in Forum

Q: The objects in the AI Room Generator Extension look perfectly arranged at right angles. How was this done?

GPT-4 has a good spatial awareness and can place objects at right angle positions and even rotations. A great way to have more control on this is to include the rotation angle on X, Y, Z, axis in your request to the LLM. For example “include the X,Y,Z rotation for each object” and store that information in the desired result such as a JSON variable.

View in Forum

So, where do I start?

A good place to begin developing AI-powered extensions using generative AI models is with the AI Room Generator Extension sample on GitHub. You can read a detailed walkthrough of the process for developing the extension in the blog, How ChatGPT and GPT-4 Can Be Used for 3D Content Generation.

Join NVIDIA at SIGGRAPH to learn about the latest innovations in graphics, AI and OpenUSD. Save the date for the NVIDIA Keynote with CEO Jensen Huang to see the latest award-winning research and announcements in OpenUSD and AI-powered solutions for 3D content creation.

See the latest Text2Material generative AI from NVIDIA at SIGGRAPH’s Real-Time Live and explore sessions in the NVIDIA Generative AI Theater.

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team. If you are a developer, get started with Omniverse resources. Stay up to date on the platform by subscribing to the newsletter, and following NVIDIA Omniverse on Instagram, Medium, and Twitter. For resources, check out our forums, Discord server, Twitch, and YouTube channels.

About the Authors

Alex Qi is a product manager for the NVIDIA AI Software Group. Her focus is on AI software and applications for a conversation AI framework (Riva) and AI/ML for multimedia streaming (Maxine). Prior to joining NVIDIA, she had extensive experience in leading challenging technical projects in technology and engineering organizations across various roles, such as data Scientist, Computational Modeling and Design Engineering. Alex holds a dual-degree masters’ from the Massachusetts Institute of Technology: an MBA from the MIT Sloan School of Management, and a master of science from the School of Engineering — Mechanical Engineering, where she focused her studies on robotics and artificial intelligence.

Mario Viviani is Manager of Developer Relations for Omniverse at NVIDIA, based in London, UK. His team focuses on helping developers and partners get familiar and onboard on NVIDIA Omniverse. Passionate technologist and hands on-developer, he’s ex-Amazon, where he led the global Apps and Games Tech Evangelism team; previously was co-founder of startups and led his own consulting company in mobile apps development. He is always projected into the future and into what is the next “big thing”!

Paul Cutsinger is director of Omniverse Exchange at NVIDIA, where he’s focused on tooling for real-time, true-to-reality simulation. With a career spanning Amazon, Disney, and Microsoft, Paul’s work centers on enabling creators to take their ideas into production.

--

--

NVIDIA Omniverse

Learn from the developers and luminaries leveraging Universal Scene Description (OpenUSD) to build 3D workflows and metaverse applications.