Why Large Language Models Hallucinate and How to Reduce it?

Jacek Fleszar
3 min readSep 8, 2023

--

If you are a power user of ChatGPT you have probably been bitten by the hallucination bug. The LLM lulls you into getting comfortable with it and then springs a convincing but totally made-up story, playing you for a fool.

These hallucinations, like dreams, are LLMs fabricating narratives. So why do these LLMs hallucinate and how do you prevent it.

Here are a few reasons:

Data Sparsity: This is the #1 reason for hallucination. GPT-4 for example doesn’t have access to recent data as it was trained in 2021. Ask it a question, that pertains to a recent topic, and it is likely to hallucinate as it doesn’t have the data for the right answer. The model is generalizing from what it has learnt but that may very well be inaccurate.

Not supervised learning: LLMs don’t have a "ground truth" or a set of correct examples. While the RLHF process tries to steer the LLM towards more correct answers. The base training isn’t a supervised learning process and this makes things challenging as the model can’t tell what is "correct" and what’s not.


Short-term context
: The model architecture has a fixed-length context window, meaning it can only "see" a certain number of tokens at a time. If important context falls outside this window, the model may lose track of it, leading to errors.

No real-time feedback loop: Like humans LLMs don’t have a real-time feedback look and don’t instantly learn from mistakes. The good news is we can refine or fine-tune models with human feedback and make them hallucinate less.

Image by source

So how do you prevent these hallucinations and are future LLMs less likely to hallucinate?

Prompt Design: Simple prompt engineering and design will reduce hallucination. For example, adding the following to your prompt help: "Provide a factual answer based on scientific evidence."

Fine-tune for a specific domain: The model can be fine-tuned on a narrower dataset that is highly reliable and relevant to the domain where hallucinations need to be minimized.

Contradiction checks: LLMs can be prompted to self-contradict themselves and then they are further prompted to recognize the contradiction and mitigate it. This falls into the category of advanced prompt engineering.

Retrieval Augmented Generation: This is a common technique used in Enterprise LLMs. At Abacus, we use this routinely. You are basically looking up the relevant documents that contain the answer in a search index first and then feeding the search results to an LLM to formulate the final answer. Since the LLM is forced to find the answer in the information it was sent, it hallucinates much less.👈

Human In the Loop: A human expert can always check the answer before it gets used. This is a labor intensive option which isn’t ideal.

Image by https://bernardmarr..com

While the above techniques work on trained LLMs, the following two techniques can be applied during LLM training.

Data Re-weighting: Assign higher weights to reliable and verified data during LLM training, effectively making the model pay more attention to them.

Longer Context Windows: Extending the model’s memory can help it maintain context over longer passages, reducing the chance of hallucinations.🎺

So while there are several easy ways to mitigate and almost completely remove hallucinations if you are working in the Enterprise context, it’s much harder in the AGI context. This is a very hot topic in AI research and several researchers are still working on it.

Thanks in advance 💐

--

--

Jacek Fleszar

Data Science||Artificial Intelligence||Programmer 👈