Understanding Hallucinations in Large Language Models: Causes and Solutions
Exploring the causes, implications, and potential fixes for AI-generated falsehoods.
Introduction 📖
In recent years, large language models (LLMs) like GPT have become essential tools in various fields, from customer support to content creation. However, despite their advanced capabilities, these models sometimes produce outputs that are factually incorrect or entirely fabricated — a phenomenon known as “hallucination.”
Hallucinations in LLMs occur when the model generates information that has no basis in the data it was trained on, often leading to misleading or inaccurate responses. This issue raises concerns about the reliability of AI-generated content, especially in critical applications where accuracy is paramount.
Understanding the causes of these hallucinations, and developing methods to mitigate them, is crucial for improving the effectiveness and trustworthiness of AI systems.
What Are Hallucinations in LLMs and its types?
In the context of large language models (LLMs), hallucinations refer to instances where the model generates information that is not grounded in its training data or is entirely fabricated. While LLMs like GPT are designed to predict the most likely word sequences based on patterns in the data they’ve been trained on, they sometimes produce content that seems plausible but is factually incorrect or invented.
These hallucinations arise from the model’s probabilistic nature, meaning it attempts to generate coherent responses without always verifying their accuracy or existence in reality.
Types:
Factuality Hallucination:
LLMs occasionally exhibit tendencies to produce outputs that are either inconsistent with real-world facts or potentially misleading, posing challenges to the trustworthiness of artificial intelligence. In this context, we categorize these factual errors as factuality hallucinations.
Table 1 categorizes types of factuality hallucinations in LLMs with examples:
- Factual Inconsistency: The LLM incorrectly states Yuri Gagarin as the first person to land on the Moon (the correct answer is Neil Armstrong).
- Factual Fabrication: The LLM creates a fictitious narrative about unicorns in Atlantis, claiming they were documented to have existed around 10,000 BC and were associated with royalty despite no real-world evidence to support this claim.
Faithfulness hallucinations:
These refer to the instances where a large language model (LLM) generates content that deviates from the original source material or fails to stay “faithful” to the information or context provided. In these cases, the model produces outputs that, while potentially coherent and contextually relevant, do not accurately represent the intended or original information.
Faithfulness hallucinations are particularly problematic in tasks where the model is expected to summarize, rephrase, or translate information accurately. When a model strays from the given data or distorts facts, it reduces the reliability and trustworthiness of its outputs.
Table 2 presents examples of faithfulness hallucinations in Large Language Models (LLMs), where the model output deviates from the user’s input or the context provided. It categorizes these hallucinations into three types:
- Instruction Inconsistency: The LLM ignores the specific instructions given by the user. For example, instead of translating a question into Spanish as instructed, the model provides the answer in English.
- Context Inconsistency: The model output includes information not present in the provided context or contradicting it. An example is the LLM claiming the Nile originates from the mountains instead of the Great Lakes region, as mentioned in the user’s input.
- Logical Inconsistency: The model’s output contains a logical error despite starting correctly. For instance, the LLM performs an arithmetic operation incorrectly in a step-by-step math solution.
Why hallucinations happen?
Hallucinations have multifaceted origins, spanning the entire spectrum of LLMs’ capability acquisition process. In this section, we delve into the root causes of hallucinations in LLMs, primarily categorized into three key aspects: Data , Training and Inference.
Training Data Issues
A key factor contributing to hallucinations in large language models (LLMs) is the quality of their training data. Models like GPT, Falcon, and LLaMa undergo extensive unsupervised training using large and diverse datasets sourced from various origins. However, ensuring the fairness, lack of bias, and factual accuracy of this data is quite challenging.
As these models learn to generate text, they may inadvertently absorb and reproduce inaccuracies present in the training data. This can result in situations where the models struggle to differentiate between truth and fiction, leading to outputs that stray from factual correctness or logical reasoning.
LLMs trained on datasets from the internet are particularly vulnerable to incorporating biased or incorrect information. This misinformation can easily seep into the model’s outputs since the model does not inherently distinguish between accurate and inaccurate data. A notable example of this is Bard’s incorrect assertion about the James Webb Space Telescope, which illustrates how reliance on flawed data can result in confident yet erroneous claims.
Architectural and Training Objectives
Hallucinations can also arise from model architecture flaws or suboptimal training objectives.
For instance, an architecture flaw or a misaligned training objective can lead the model to produce outputs that do not align with the intended use or expected performance.
This misalignment can result in the model generating content that is either nonsensical or factually incorrect.
Inference Stage Challenges
During the inference stage, several factors can contribute to hallucinations.
These include defective decoding strategies and the inherent randomness in the sampling methods used by the model.
Additionally, issues like insufficient context attention or the softmax bottleneck in decoding can lead to outputs needing to be adequately grounded in the provided context or the training data.
Prompt Engineering
The way prompts are engineered can also influence the occurrence of hallucinations.
The LLM might generate an incorrect or unrelated answer if a prompt lacks adequate context or is ambiguously worded.
Effective prompt engineering requires clarity and specificity to guide the model toward generating relevant and accurate responses.
Implications of Hallucinations
LLM hallucinations can be dangerous with serious real-world impacts. A recent example is the Mata v. Avianca case, where a New York attorney used ChatGPT for legal research. This led to the inclusion of fabricated citations and quotes in a federal case. Attorney Steven Schwartz unknowingly relied on ChatGPT-generated false citations.
This case highlights the direct consequences of using AI-generated content without verification and raises broader ethical and professional concerns within the legal field. Such incidents can significantly erode trust in AI technologies.
When LLMs produce hallucinations — fabricated or inaccurate outputs — they risk spreading misinformation. Relying on AI for tasks like legal research assumes that the outputs are reliable, but hallucinations can lead to serious professional and legal repercussions, as seen when the attorneys in Mata v. Avianca faced sanctions for relying on non-existent case law.
Beyond individual cases, the risks include broader societal implications. Misinformation from AI can influence decision-making and potentially lead to cyberattacks. In the legal sphere, it can compromise the integrity of judicial proceedings, where accuracy is crucial. This case underscores the need for rigorous verification of AI-generated content and maintaining ethical standards in professional conduct.
Mitigating Hallucinations
The research paper “A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation” addresses hallucinations in Large Language Models (LLMs) and presents a novel approach to detect and mitigate them.
The authors’ method involves detecting potential hallucinations by analyzing the model’s logit outputs, followed by a validation process to confirm and correct errors. If a hallucination is detected, their mitigation strategy rectifies it without introducing new hallucinations, even in cases of false positives.
The results are promising: the detection technique achieved an 88% recall, identifying most hallucinations, while the mitigation strategy successfully corrected 57.6% of the detected issues. Importantly, no new hallucinations were introduced. Tested on GPT-3.5 in an article generation task, the method reduced hallucination rates from 47.5% to 14.5%, significantly improving output reliability.
This research highlights the importance of ongoing efforts to ensure the factual accuracy of AI-generated content, setting a foundation for future work to further improve LLM reliability in real-world applications.
Additional Resources
Here is a list of academic papers, technical resources, and real-world case studies on LLMs and AI safety for further reading:
- DelucionQA: Detecting Hallucinations in Domain-specific Question Answering
- Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI
- Knowledge Injection to Counter Large Language Model (LLM) Hallucination
- BERTScore: Evaluating Text Generation with BERT
- List of prior works on LLM hallucination, organized by evaluation, benchmark, enhancement, and survey — Reddit Thread
- Enabling Large Language Models to Generate Text with Citations — Paper
- TruthfulQA: Measuring How Models Mimic Human Falsehoods (Open AI, University of Oxford): https://arxiv.org/pdf/2109.07958.pdf.
- Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data (Google): https://arxiv.org/pdf/2010.05873v1.pdf.
These resources provide different perspectives and insights into hallucinations in LLMs and how to mitigate them.
Conclusion
Hallucinations, where Large Language Models (LLMs) generate plausible but incorrect or nonsensical information, pose a major threat to their reliability and safety, especially in critical fields like healthcare and law.
Efforts to mitigate hallucinations are crucial for maintaining the credibility of LLMs. Key approaches involve using linguistic metrics like ROUGE and BLEU, as well as content validity metrics based on information extraction (IE), question answering (QA), and Natural Language Inference (NLI). Metrics like FActScore help verify the accuracy of individual facts.
Looking ahead, LLM development is focusing on improving robustness and safety by grounding responses in verified information. Techniques such as SelfCheckGPT assess consistency across multiple answers, while methods like chain-of-thought prompting and Retrieval-Augmented Generation (RAG) aim to enhance precision.
Tools like BERTScore and NLI are used to evaluate consistency and accuracy in responses. Ongoing efforts in the AI community demonstrate a strong commitment to developing reliable and trustworthy AI systems.