Hallucinations in Large Language Models: Causes and Remedies in Applied AI Use Cases

Michael Schmidt
DataRobot

--

In the realm of large language models (LLMs), “hallucination” is a term that encapsulates instances where an LLM generates information that isn’t factually correct or relevant. This phenomenon can arise from a variety of causes, including misinterpretation of user requests, flawed reasoning, or simply fabricating facts. Importantly, the context of a query plays a significant role in determining whether an output is a hallucination. For example, an LLM may provide a technically correct but contextually irrelevant answer to a question based on a misinterpretation of the user’s intent. In this blog post we further define hallucination in LLMs, its consequences, and opportunities that it can provide.

Introduction to Hallucination in Large Language Models

The concept of hallucination in the context of LLMs presents a multifaceted and intriguing challenge in the realm of artificial intelligence. Broadly speaking, hallucination can be described as instances where an LLM generates a response that is untrue or not rooted in reality. This phenomenon manifests in various ways, ranging from misinterpretations of user requests to erroneous reasoning steps, such as making invalid substitutions in an algebra problem or fabricating facts.

Crucially, the nature of hallucination is highly context-dependent. For example, consider a scenario where the question posed is, “Who was the first person to walk across the English Channel?” and the model responds with “Matthew Webb.” In a certain context, where the intended query might have been about swimming across the Channel, this answer is accurate and relevant. Matthew Webb was indeed the first person to swim across the English Channel. However, if the question is taken at face value, concerning walking across the Channel, the response enters the realm of hallucination, as it does not align with the factual reality of the situation. What should the model do in this instance? Should it give the closest answer it can or ask follow-up questions? How pedantic should it be before finally giving an answer? This is what makes this type of hallucination so situational.

Many applications of generative AI today use reference content to help answer questions, for example by looking up the most similar reference material to a question and providing it to the LLM as preceding text to the question. Typically, the reference content is meant to be treated as ground truth. From this viewpoint, hallucination occurs when a model produces a response that is not justified or entailed by the preceding text. This definition underscores the importance of coherence and relevance in the model’s processing and response generation mechanisms. But additionally, what if the grounding data is incorrect or out of date? The end results would be the same as any other hallucination.

Detecting and addressing different types of hallucinations in LLMs involves understanding these nuances and developing techniques that can discern between valid extrapolations of information and unfounded or erroneous assertions. This area remains a vibrant and challenging aspect of AI research, with ongoing efforts to refine models for greater accuracy and reliability in their responses.

The Potential Consequences of Hallucinations in Large Language Models

The implications of hallucinations in LLMs are extensive and vary significantly depending on the application. These potential consequences are shaped by several factors: the level of human engagement in overseeing the LLM’s output, the visibility and verifiability of the context data upon which decisions are based, and the specific nature of the use case.

In critical sectors like healthcare, where LLMs might be utilized for tasks such as diagnostic assistance or interpreting test results for patients, the consequences of hallucination can be particularly severe. If a model provides incorrect information and there is insufficient human oversight, it could lead to harmful or even disastrous outcomes for patients. Similarly, in the financial sector, reliance on LLMs for tasks such as financial reporting or decision-making can result in significant inaccuracies if the model’s output is not meticulously verified.

The insurance industry also faces risks with the use of LLMs. Hallucinations can lead to incorrect denial or approval of claims or the determination of inappropriate premium rates. In software development, subtle bugs might be introduced due to inaccurate information provided by the LLM.

Another significant concern is the issue of bias and fairness. There is a risk that LLMs may exhibit a higher rate of hallucination on topics related to underrepresented and marginalized groups. This could perpetuate or amplify existing biases, leading to unfair outcomes or reinforcing stereotypes.

Often, the challenge with hallucinations in LLMs is not just the potential impact but also the difficulty in detection. For example, when a user queries whether a particular method exists in a programming library like Pandas, and the model affirms incorrectly, it can be frustrating and time-consuming. The user may have to run the code to discover the error, and even upon correction, the model might propose another incorrect solution or oscillate between two incorrect answers.

The Potential Positive Effects of Hallucination in Large Language Models

While the term “hallucination” in the context of LLMs often carries a negative connotation, there are scenarios where these so-called hallucinations can yield positive outcomes. Certain hallucinations can be seen as useful extrapolations of the knowledge they were trained on. These instances primarily arise in creative and exploratory contexts, where the deviation from exact concepts appearing in the training data can spur innovation, creativity, and problem-solving.

In creative writing and artistic endeavours, the novel responses generated by LLMs can be particularly valuable. These unexpected outputs can inspire writers and artists, leading to unique and original ideas that might not have been conceived through conventional thought processes. For example, a hallucinatory response in a storytelling context might introduce a plot twist or character trait that enriches the narrative in an unforeseen way.

In the brainstorming and ideation process, hallucinations in LLM outputs can act as a catalyst for creative thinking. When seeking new solutions to complex problems, the unconventional and unanticipated responses provided by an LLM can help break the confines of traditional thought patterns. This can lead to the exploration of new avenues and perspectives that might otherwise be overlooked.

Moreover, in certain problem-solving scenarios, particularly those that require lateral thinking or a departure from linear logic, the unpredictable nature of LLM hallucinations can be beneficial to sample and explore the space of solutions much more broadly. By providing responses that might not strictly adhere to conventional reasoning, LLMs can help users think outside the box and explore solutions that are innovative and unconventional.

Additionally, in the domain of entertainment and games, the element of surprise and unpredictability offered by LLM hallucinations can enhance user engagement and enjoyment. In interactive storytelling, gaming, or chatbot conversations, the unexpected turns and responses can make the experience more dynamic and enjoyable for the user.

Strategies for Reducing Hallucinations in Large Language Models

Despite the potential beneficial forms of hallucinations, a large portion of hallucinations today are negative and need to be reduced. Reducing hallucinations in large language models (LLMs) is a multifaceted challenge that requires a combination of advanced technological strategies and human oversight. Here are several approaches that can assist in minimizing the occurrence of hallucinations in LLMs:

Retrieval-Augmented Generation (RAG) and Clean Virtual Database (VDB): A well-implemented RAG setup, combined with a clean and well-curated virtual database, can significantly enhance the accuracy of an LLM, assuming the grounding data is accurate and up-to-date. This approach allows the model to pull in relevant information from external sources, thereby grounding its responses in verified data.

Human Verifiability: Incorporating features such as citations, groundedness scores, and relevance scores can improve human oversight. These tools enable users to verify the information provided by the LLM, ensuring that its responses are based on credible and relevant sources.

Automated Verification Techniques: Utilizing automated techniques to test the groundedness of data can be effective. Implementing models that assess the LLM’s ‘surprise’ by a given answer or use logic verification (like the TRUE model) can help identify and correct hallucinatory responses.

Chain of Thought (CoT) and Related Prompt Techniques: Encouraging LLMs to ‘think out loud’ by breaking down problems into simpler, more manageable steps can reduce the likelihood of hallucination. Techniques like few-shot reasoning examples, reason-then-answer formats, and multi-agent debates can guide LLMs to process information more thoroughly. However, this approach might require more effort in prompt crafting and the consumption of more computational resources.

Fine-Tuning for Better Domain Knowledge: Enhancing an LLM’s knowledge in specific domains can improve its accuracy. However, balancing the enhancement of domain-specific knowledge with the retention of conversational and instruction-following capabilities can be challenging.

Reinforcement Learning from Human Feedback (RLHF): While there are mixed opinions on the effectiveness of RLHF in reducing hallucinations, it can guide the model towards better reasoning paths and align more closely with human conversational expectations. However, without sufficient context for why a human might prefer one response over another, RLHF might inadvertently reinforce some guessing behaviors in the model. For example, human reviewers may rate highly-confident answers as better than others even if they are not necessarily more accurate.

Use Case-Specific Human Feedback Guard Models: A simpler form of RLHF is to build a predictive model based on end-user or reviewer feedback of answers in a particular use-case. The guard model can learn potentially subtle notions of correctness, which can include things like did the output use the right language and style. These models can capture hallucination as well in the cases where the human reviewer flags common or recurring hallucinations patterns. While guard models are not always robust enough for fine-tuning as in RLHF, they can be used in production to block high risk responses.

In conclusion, reducing hallucinations in LLMs requires a comprehensive approach that combines technological advancements with strategic human involvement. By implementing a combination of these strategies, it is possible to enhance the reliability and accuracy of LLMs, making them more trustworthy and effective tools in various applications.

Conclusion: Navigating the Future of Hallucinations in Large Language Models

As large language models (LLMs) continue to improve, the occurrence of undesirable hallucinations may diminish over time. However, they are unlikely to disappear completely. The comparison with AlphaZero, a model with superhuman abilities in the game of Go, is illuminating. AlphaZero excels due to a clear reward function and the capability to rapidly iterate through millions of games, refining its strategy. However, the landscape of language is far more complex and nuanced than the binary win-lose scenarios of a board game.

The road to reducing hallucinations in LLMs is not a straightforward one. Language, by its very nature, lacks a definitive signal for improvement, and the environment in which LLMs operate is vastly more intricate than a structured game board. Despite these challenges, there is significant potential for improvement. As we develop more sophisticated methods of rewarding LLMs, their reasoning abilities will undoubtedly enhance.

Effective communication with LLMs is a two-way street. It necessitates a continuous effort from us to articulate our needs and intentions clearly and unambiguously. This ongoing interaction will play a crucial role in guiding LLMs towards more accurate and reliable outputs.

Future advancements in model architecture, such as multi-token reasoning and diffusion models, hold promise for further reducing hallucinations. Likewise, changes in training data, especially the inclusion of synthetic data rich in logical, diverse, and accurate reasoning examples, will aid in this endeavour. Moreover, refining reward models through process supervision can explicitly teach LLMs to avoid hallucinatory responses. Drawing inspiration from AlphaGo, which demonstrates consistent superhuman reasoning without hallucinations, we can learn to apply similar high-quality reward models in more open-ended text domains.

Finally, advancements in LLM applications will provide better personalized context, thereby reducing the human effort needed to specify intent. This evolution can be likened to the way search engines like Google suggest additional search terms based on recent user activity. By incorporating these various strategies and innovations, we can expect a gradual yet significant reduction in the frequency and impact of hallucinations in LLMs.

In summary, while the complete eradication of hallucinations in LLMs remains a difficult goal, the path forward is marked by promising developments and innovative approaches. As we continue to refine and advance these models, our understanding and management of hallucinations will evolve, leading to more accurate, reliable, and effective language models.

--

--