Integrating Large Language Models like Open AI’s GPT with Unity 3D
--
I’ve been building agents with Large Language Models (LLMs) for the past few months. First with Meta’s LLaMA and then with Open AI’s GPT. The switch to GPT was simply a practical one for me. I wanted to focus more on the agent-building than I did on the hardware/cloud resources wrangling you have to deal with when running your own LLaMA 1/2 instance, and nothing I’ve done so far really requires the freedom a hearty LLaMA variant will provide you with. Not yet anyway. If you happen to just have a few A100 cards lying around to run LLaMA on (or even I guess a couple of RTX 3090s on the cheap end), by all means fire up a few dozen billion parameters and have at it. Me personally, I got tired of hearing Jeff Bezos whispering “cha-ching, cha-ching, cha-ching” in my ear every time I logged into AWS. P3 instances are NOT cheap, if you were wondering. Open AI’s pricing model, on the other end, is completely reasonable.
The roots of what I’ve built so far were also done for another project that went belly up and was originally intended more for business or corporate purposes than games. But after staring at the carcass on my laptop for a few weeks, I decided I’d waited long enough and that it was time to translate the python that project had been born with into something else more… fun. Why not a game? Games love agents. They’re full of them.
The problem is I’m not really interested in building a game right now. I have a day job, and three children, and a ton of other stuff to do, and that grind doesn’t really appeal to me as much as it used to. But building some tools for other people to use to build neat stuff still does. So, the plan really became why not build a tool for Unity 3D so somebody else, somewhere else might use it to make a really cool game with LLM-powered AI agents? That sounded like a great idea to me, so it’s exactly what I did. You can find a demo project with all of the relevant code I’ll be discussing on GitHub.
What exactly am I doing with LLMs, anyway? Good question. I am leveraging LLMs to give game-character NPCs (the “agents”) a personality based on parameters given by the developer. Right now, my agents just talk at each other, but more in-depth behavior is something I’m scheming on for the near-future. What kinds of parameters am I talking about, though?
For starters, I’ve incorporated the big-five personality traits into the agent system. OCEAN is their acronym. ‘O’ is for Openness, ‘C’ is for Conscientiousness, ‘E’ is for Extroversion, ‘A’ is for Agreeableness, and ’N’ is for Neuroticism. You can assign values 0–4 for each of them. Additionally, I’ve provided similar parameters for Anger, Happiness, and Sarcasm. I’ve also put in fields for “Secrets”, or a list of things the agent won’t want to talk about directly, and “Shirt Sleeve”, a list of things the agent will want to talk about at every opportunity. I put a video of the system generating dynamic dialogue for two (very simple) game characters on YouTube.
How am I getting the LLM to do this? Real simple, my system is dynamically generating text prompts. In a similar way the HTML your web browser is rendering right now to show you this website with is generated dynamically by a bunch of JavaScript, my system is dynamically generating prompts to illicit responses from GPT with a bunch of Unity 3D C#. Phrases are cut and pasted and stuck together just right to represent the configuration of personality traits you’ve given your agent. It’s a whole lot of string manipulation, and it’s become the second-year CS undergrad project from hell. Luckily for you I get to worry about that, and you don’t have to. For these reasons, I’m calling my system the LLM Front-End.
I also implemented some capability to give the agents the sense of “sight”. Essentially it’s a dynamic mesh based on given values for vertical/horizontal field of view and min/max view distance. Anything worth knowing about within the generated mesh will return to the agent its written description when called, letting the agent “know” what it is by incorporating that description into the prompts fed to GPT to generate dialogue with. The old man in the demo project I’ve posted doesn’t need to be explicitly told information about the world, he knows about the little boy “staring at the sky, wondering why it’s blue” because the boy is in his field of view and there’s nothing obstructing the line of sight between them. After that, a sense of “hearing” takes over when the boy talks and the man incorporates the “heard” statements into his own reply requests sent to GPT.
So, your game needs a superhero to fight the evil aliens from Planet X? Sounds like a real type-A to me. Probably give him/her high values for Conscientiousness and Agreeableness, but a low value for Neuroticism. Happy and Angry can change depending on the situation. If the aliens blow up the Eiffel Tower, that will probably make your hero Angry when they “see” it, right? But if the alien mother ship flies into the sun, that might be a Happy moment. You want your hero to crack jokes like a certain friendly neighborhood web-slinger? Give them some sarcasm. It’s completely up to you, and the values can change whenever you want them to, for whatever reason you want them to.
The prompts are sent off to interact with Open AI’s GPT API the same way any web call would. Unity comes with a bunch of built in functionality to handle all of that for you. Just wrap the prompts up in some JSON the way the Open AI docs tell you to, send them on their way, and whatever GPT model you’re calling will get back to you as soon as it can. I’ve gotten this to work already in my system, you’ll just need to enter your own API key in the relevant spot to get it to work.
One problem you’re going to run into, though, is rate limits. Especially if you’re using GPT-4, which unsurprisingly generates the best output. I’ve had to give it somewhere around 10 seconds between requests in order not to hit the limit, making the conversation between even just two NPCs somewhat slow. A crowd of them is pretty much out of the question due to this factor alone, not even accounting for whatever that would cost in real money to run.
This issue is motivating me to work in some method of assigning different models for different tasks. For example, eventually I see the system coordinating agent actions in addition to just dialogue. As in, “There’s a zombie coming for you, should you run away? Yes or no only please.” And then use the output to call a function: Run(where) for yes, and I guess Die() for no? You don’t need GPT-4 for that one. I think lower priority NPCs with less complex personalities could probably do with a bit of GPT-3 magic instead, also.
Alternatively, if I wanted to get around something like AI-safety restrictions built into GPT , I could send those requests to a server running a LLaMA variant. The question “Should you shoot the zombie in its stupid face with a gun?” would give GPT a hernia about how “Violence is bad, mmmmk?”, but there’s a good chance a model like Nous-Hermes-Llama2–13b would reply with something like “Just shooting the zombie in its face won’t do, you have to destroy its brain to kill it” (I haven’t tested that, but it’s what I’d expect). However, I’d probably have to host Nous Hermes myself.
Fortunately, most of what I am currently doing with Open AI’s GPT API can and will be interfaced with some LLaMA variant again in the near future. The chat system I was using for LLaMA earlier (Colossal AI) features an RESTful interface that’s very similar to Open AI’s API. I imagine building support for it in Unity will happen as soon as I’m fed up enough with the rails Open AI has put GPT on to motivate me to actually do the work, but that’s another story.
The way I see it, though, coordinating calls to different models is probably going to rely on a behavior tree, or some other high-level logic-structure like that. I’d been using PyTrees in python, previously. There are few behavior tree systems on the Unity 3D store you can find, and they’re pretty good. I can’t ship those with my own project for you to use, unfortunately. However, I have written my own behavior tree asset for Unity 3D before, it just doesn’t have a fancy UI to use it with. So, maybe I’ll incorporate that in the future instead? We’ll see.
Originally, I pondered over picking Epic’s Unreal Engine over Unity. I’m familiar with both and their APIs, but I picked Unity (for now) just for the rapid prototyping it provides over Unreal. This is despite Unreal being more feature complete. I can argue about these points until I’m blue in the face, but to hardcore users/devs we’re talking about a matter of religion here. So, I’ll just save that for reddit comments or something and say I’m going with Unity for now but may translate what I’m working on (again) in the future to Unreal. Primary motivation in that case would probably be to use the Unreal behavior tree or some other feature it comes out of the box with that you have to buy separately for Unity 3D on their asset store.
Again, I’ve put all of the code up on GitHub in the form of a demonstration Unity 3D project. It is released under MIT license, so please do something cool with it. Hopefully you make mega $$$ with your awesome game! The project is something of a hack still, though. I love that Reid Hoffman quote about how if you aren’t completely embarrassed by a release, you’ve released too late. Definitely haven’t released too late here, haha.






