AI Powered NPCs — Hype, or Hallucination?

Rabbit Rabbit

Published in

curiouserinstitute

11 min readDec 9, 2023

by Reed Berkowitz

A study by Inworld found that out of 1000 gamers:

99% Believe advanced AI NPCs would positively impact gameplay
81% would be willing to pay more for a game with AI-improved NPCs
Over half (52%) dislike repetitive NPC dialogue
76% want to see NPCs with better situational awareness
78% would spend more time playing games with advanced AI NPCs

Gamers want to spend more time and money talking to AI NPCs

A new report finds the majority of gamers crave more personality from NPCs, with the majority ready to spend more time…

aimagazine.com

AI and specifically large language models (LLMs) are amazing at chat. It’s one of the things they do best. Some of the most well-funded and highly evaluated AI companies are AI chat companies. Character.ai is evaluated at over $1B. Inworld raised $500M to help create AI NPCs. And yet, something has been keeping the multi-billion dollar AI industry out of AAA games.

Current video games can have hundreds or even thousands of NPCs in them that players can interact with and gamers want AI powered NPCs and are willing to pay extra for them. So why aren’t AI powered NPCs in every single video game right now? Why are games not adopting AI technology? Why are games relying on random “barks” and stunted dialogue trees?

Yes, it could be expensive. Yes, it’s new technology. Yes, it requires different workflow to create and new pricing. But even if all of that was sorted out, which it is, there is a bigger issue.

AI driven NPC dialogue can completely break games.

Game State

The first problem, is connecting what is essentially text into the game-state which I wrote about here:

AI In Games: Complicated Characters

Connecting LLM Driven Characters To The Game State

medium.com

I also give a basic example of how to solve some of those issues.

But it’s not the biggest problem. The biggest problem is that sentences like this can break a game.

“Hey, you want to go next door and grab a coffee?”

Really?

Yes.

Hallucination

LLMs are creative. If you ask them to create a spell to ward off monsters, they’ll happily create one for you. If you tell them to describe a city made of cheese, they will. The models are built to follow our lead, no matter where we ask them to go. The issue is that often we don’t want them to make things up, we want to know objective facts about the real world. Here the models struggle a bit more because they can’t always tell the difference.

The reasons for hallucinations are complex. The model is trying to recognize patterns it has learned from its training data in order to provide the most likely completion. For instance, if it is trained on fan fiction scraped from the internet, it might have learned that at the end of a story, there is usually an author’s note such as “I hope you enjoyed this story! If you liked it visit [url] for more or follow me at [social media address].” It may have tags or reviews after too. So the LLM knows this pattern, and fills in a likely sounding URL and social media address. It is what fits the pattern most completely. It is not concerned with whether that URL is “real” or not. It’s creating what is statistically “expected” and since the story is also made up, why not the user name, URL, and other things that are supposed to be there? To the LLM there is no difference.

I’ve had AI models try to convince me that there are python libraries that don’t exist, that cats can be prosecuted for crimes, that the AI has met my wife in a redwood forest (I guess that one could be true), and often had them send me to websites that don’t exist for services that don’t exist. All because it is what the AI predicts would be the most likely pattern.

The largest AI companies in the world are working hard to figure this out and are only partially succeeding. AI’s still hallucinate and what they produce is often not “factual”.

Stack Overflow temporarily had to ban the use of AI generated answers because the code produced was too easy to create and was often wrong. Not just wrong, but convincingly wrong.

“The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce,”

AI-generated answers temporarily banned on coding Q&A site Stack Overflow

Stack Overflow has temporarily banned users from sharing answers to coding queries generated by AI chatbot ChatGPT. The…

www.theverge.com

In a recent case lawyers have been fined and sanctioned for using AI to do their research as the AI made up several cases to “help” the lawyers’ research!

When filing a response, Mata’s lawyers cited at least six other cases to show precedent, including Varghese v. China Southern Airlines and Shaboon v. Egypt Air — but the court found that the cases didn’t exist and had “bogus judicial decisions with bogus quotes and bogus internal citations,” leading a federal judge to consider sanctions

Lawyer Used ChatGPT In Court-And Cited Fake Cases. A Judge Is Considering Sanctions

The attorney said in a filing that he didn't understand ChatGPT "was not a search engine, but a generative…

www.forbes.com

A survey by Tidio found that:

Around 46% of respondents frequently encounter AI hallucinations, and 35% do so occasionally
About 77% of users have been deceived by AI hallucinations, while as many as 33% of those who never experienced them believe they would be misled

When Machines Dream: A Dive in AI Hallucinations [Study]

AI is notorious for spreading false information, known as AI hallucinations. Find out more about the issue, its impact…

www.tidio.com

The New York Times reports that:

“Their research also showed that hallucination rates vary widely among the leading A.I. companies. OpenAI’s technologies had the lowest rate, around 3 percent. Systems from Meta, which owns Facebook and Instagram, hovered around 5 percent. The Claude 2 system offered by Anthropic, an OpenAI rival also based in San Francisco, topped 8 percent. A Google system, Palm chat, had the highest rate at 27 percent.”

Chatbots May 'Hallucinate' More Often Than Many Realize

When summarizing facts, ChatGPT technology makes things up about 3 percent of the time, according to research from a…

www.nytimes.com

Yes, that’s right, the best AI on the planet makes things up 3% of the time.

This is obviously a huge problem, but you might be tempted to think that for games, what does it really matter? We’re talking to NPCs, not doing law or coding in C#.

But in reality, it’s much much worse in games. Let’s find out why.

Game Reality

All of this effort to get AI models to be able to tell the difference between what is a real pattern and what is a pattern that is not real doesn’t always translate to games.

A game is already partly an illusion. So game reality does not conform to actual reality.

For instance, in a medieval fantasy game the phrase, “Want to come over to my house and play video games?” would be an obvious error, but NOT in the “real world”.

So the game has its own world model that is maybe not even a subset of the real world. It could be completely fictional. So in order for the LLM not to hallucinate in the fictional world, the model has to understand the “reality” of the constructed world and not hallucinate from that model.

In other words, the game world model is not necessarily in the base model training data AT ALL. :( Or, more likely, there might even be conflicting information.

So game creators have to give the models the information that comprises the reality of the game world and then still deal with hallucinations on top of that.

And we’re just getting started with the challenges.

Character Knowledge

The characters in a game each have a separate reality within that imaginary game world. Each person in the world has to have its own identity and their story can’t interfere with either the main plot, or conflict with the reality of that world.

Each character has a subset of knowledge about the world, and a set of knowledge about their own lives that has to mesh with everyone else’s knowledge. So one character knows only a handful of places in the world and only some of the people.

Why is this important?

Because often, the LLM will try to move the story along by trying to please you. So if you are trying to find Abraxor, Master of the Assasins’ Guild, a friendly inn keeper might mention that Abraxor is his son or best friend and of course he’ll help you! Or that there’s no point going on a quest to find the Wizard of the North because he’s staying right here at the inn!

Characters need to be constrained in what they know and who they know and they can’t say things that break the plot and structure of the game. Everyone has to know their own things and if the LLM makes things up, it can make things up that inadvertently break the continuity and even game play of the games.

After the characters know what they know, they also have to know what they can actually do.

Game Mechanics

This is maybe the most game-breaking characteristic of AI driven NPCs.

It is extremely easy for an LLM driven NPC to say something that creates an expectation that is not programmed into the game.

Let’s take a mild instance.

For example, a barkeep might befriend you and invite you to his house to play a local game of Rutanny and have dinner with his family. This is normal in a chat situation but can ruin immersion in a game. Why? Because after the barkeep says this he just stands there. Because the barkeep isn’t programmed to be able to leave his bar. He isn’t programmed to walk around freely. Even if he could, there is no house created for him to go to. And there is no family. And there is no kind of game called Rutanny. And even if he had said “chess” it hasn’t been programmed into the game as a mini game.

That’s the “Hey, you want to go next door and grab a coffee?” problem. No coffee in this world, character can’t leave, and there is nothing next door built.

Maybe you could dismiss a few of these, but the danger is when the game creates hallucinations that are hard to tell from the game mechanics themselves. They may even copy game mechanics. Especially if the model is familiar with the game mechanics! It might mimic them to get a good completion.

An example might be that a character gives you a quest that doesn’t exist. You can spend a long time searching around for the “dark swamp to the south inhabited by witchlings and a mysterious magic sword” before you realize that LLM just made it up on the spot! Characters might agree to give you things they don’t have. Instruct you to do things the game isn’t made to do. Or join you for adventures that they can’t join.

Any NPC can say something that can break a game if it has no awareness of what the game mechanics are. In chat, it’s just a re-roll or edit. In a game, it can be extremely confusing.

Change Is The Only Constant

As mentioned above, LLMs have to be tied into game state in the sense that if a character gives the player an object from their store, it must be recorded and tied back into the game state as an inventory. Text has to be converted into math.

But also, in terms of conversations, the characters must be aware of changes in the larger game state. It goes both ways. Everything listed above changes as the game changes. So the knowledge of the LLM driven character is fluid and has to be constantly updated for the LLM.

For instance, if the “Imperial Space Lord” is killed, then NPCs would know that. If two of your party members are dead, people shouldn’t still be talking about them as if they are alive. If you are a member of a secret society, characters in the society have to be updated to include you as one of their own, etc.

Recap

In order for LLMs to power NPCs that don’t break immersion or game-play, these factors have to be in place.

The AI NPC system should:

Not hallucinate or deviate from “facts”.
Has to understand the game world model as a set of “facts” and know which real-world “facts” conflict with the game-world “facts”.
Understand what each character knows and can not know
Update that information as the game progresses and be informed of changes as the game progress
Understand what actions can actually be performed by the mechanics of the game.
Know what exists as a game object locally or globally (for instance knowledge of game maps and inventory)

Solutions

Basically, we’re working on it. :)

If there were good solutions, you’d be chatting with your favorite NPCs in all your AAA games.

Right now, anything live is probably going to be a balancing act.

It’s no good to have characters that are so locked down that they are repetitive and boring. In these cases, it’s definitely better to have dialogue trees and writers. It’s also no good to have bright and creative characters hallucinating and ruining the plot and gameplay.

There are a lot of tools and techniques emerging that are closing the gap between the two including vector databases, multi-pass prompts (AI checking the AI), knowledge graphs, increased context sizes, smarter models, better prompting, dynamic and complicated prompt cycles, etc.

Perhaps the most important thing is to create new game-play that conforms and adapts to what the models are good at, instead of expecting them to be able to perform miracles right out of the box. Gameplay must be novel and set expectations.

Companies like Convai and Inworld are working on these complicated issues and connecting LLM powered NPCs into game environments but they have a long way to go.

Follow along with Curiouser as we explore and solve these issues! It’s going to be a wild trip :)

Curiouser is a game design and consulting company specializing in AI, new technology, applied gaming, and immersive experiences including ARGs.

https://curiouserinstitute.com/