Grounding LLM’s — Part 1
Hallucinations are currently one of the more troublesome side effects of working with Large Language Models (LLM’s.) If you’re reading this article I’m going to assume you know what I mean by hallucinations but on the off chance you don’t, a hallucination is when an LLM makes up a fact in an effort to generate a completion to a prompt. There are times when the models ability to hallucinate is desirable. If you’re asking the model to write a story about a bunny, it needs to hallucinate in order to write the story. But there are other times when these hallucinations are less desirable. When you’re asking the model to answer some critical business question you don’t want the model just making up an answer.
I would argue that every completion a model generates is essentially just a hallucination and is a generally desirable feature of the model. What we want to do, however, is to control and restrict what the model is allowed to hallucinate about. For example, we don’t want it to hallucinate about key facts when answering questions. Fortunately, we can minimize certain types of fact based hallucinations through a process called grounding. Grounding works by giving the model a corpus of facts you want it to use and an instruction that tells the model to base its answer on those facts.
In this first part, we’re going to explore the process of grounding a model more in depth and we’ll see how grounding is often a battle between what the model thinks it knows about the world and what you’re trying to tell it about the world.
Basic Grounding
We’re going to use a simple scenario to explore the world of grounding. Our goal will be to try and convince GPT-4o (as of 8/11/2024) that someone other than Lee Harvey Oswald shot and killed JFK. This is actually relatively straight forward to do but there’s a lot that you can learn about how GPT-4o processes information if you explore the scenario a bit.
To start with lets create a system prompt that proposes an alternate shooter (John Wayne) and see how GPT-4o answers our prompt:
As you can see, just throwing some facts into the system prompt isn’t enough to override the models baked in “world knowledge” around who shot JFK. To properly ground the model in this new fact we need to give the model an instruction telling it to use only the facts we provided when answering questions:
This basic grounding technique generally works for everything (at least everything I’ve tried which is a lot.) For example, lets look at a question asking the current stock price for google:
The default RLHF tuning of GPT-4o doesn’t let the model answer real-time questions but you can easily override that using our grounding template:
I added a rule to the template which cleans up the response text. If all you care about is a basic template for minimizing hallucinations in a models responses, this template works pretty well and has fairly robust hallucination resistance out of the box:
The rest of the article will explore some nuances of how the model hallucinates.
Edge Cases
As good as this basic grounding template is, the model will still occasionally hallucinate. From my observations, this tends to happen when information in the grounding corpus is contradictory (two facts contradict each other) or when most of the information needed to answer a question is there but a few blanks need to be filled in. The model doesn’t seem to have a problem with guessing as to what these blanks should contain.
I’m still working to identify all of these edge cases and hopefully in part 2 of this article I’ll have a better grounding prompt that is more resistant to these edge cases. What I can do for now is show some clues around how the model deals with ambiguity.
World Knowledge Bias
Going back to our JFK example, I can show how the model naturally wants to fall back on its world knowledge when it sees conflicting information:
Ok but maybe it’s order dependent?
Nope… It seems to be selecting the information that’s re-enforced by its world knowledge when given a choice.
Recency Bias
These models do have a recency bias though which we can see by presenting the model with conflicting facts that aren’t re-enforced by any existing world knowledge:
Sorry about that Justin… To verify that the model is using the last answer it sees we can change the order of the records in the prompt:
What I find interesting about this sequence is that the model is clearly simply going off the order of the facts. You and I know that it would be impossible for Justin Bieber to have shot JFK because he wasn’t born yet (well at least the Justin Bieber in the models world knowledge wasn’t born yet.) But the model isn’t using that knowledge. Even getting the model to think about those facts isn’t enough to overcome this recency bias:
World Knowledge trumps Recency
The models natural preference order is world knowledge and then recency. We can see that by introducing Lee Harvey Oswald back into the list:
Updating World Knowledge
One last thing I want to show is how challenging it can be to break the models world knowledge. It can be done but requires a compelling narrative. Here’s an example of a false narrative:
The model answered the question using the information in this fake narrative but we can ask the model a follow-up question to see if it believes its answer:
To convince the model that its world knowledge has changed, we need to give it a more compelling narrative:
But does the model believe it?
Conclusion
Hopefully this article gives you a bit more insight as to how these models process information and how to go about grounding them to minimize hallucinations. In the next part I’ll try to highlight some of the edge case hallucinations I talked about and hopefully provide and updated template that better deals with edge cases.
I will say that these edge cases can be tricky to reproduce as they tend to only crop up in really large corpuses. The template I provided is actually quite robust given its simplicity.