Reducing AI Hallucinations with RAG: Automatic Podcast to blog

12 min readOct 15, 2023

Generating a medium.com article automatically from a voice where Mihai Criveti featured on IBM Fellow Jerry Cuomo’s Art of AI Podcast.

I’m hosting a hands-on workshop on Practical GenAI with HuggingFace 🤗 and Python models soon, and decided to challenge myself a bit by generating this entire post using IBM’s watsonx.ai, LLAMA2 and a Retrieval Augmented Generation platform I’m building. This content is based only on a 15 minute podcast where I discuss Large Language Models and Hallucinations with IBM Fellow and VP of Technology, Jerry Cuomo.

I’ve already hand-written an article on Reducing LLM Hallucinations with RAG that the podcast was based on, which is not in the training dataset of the models used. This takes those techniques one step further with Reciprocal Rank Fusion, Hybrid Searching, and more.

The content below is generated by watsonx using LLAMA2, including all the Questions and Answers, which have also been auto-generated. The source content is the “Art of AI Episode 4: Hallucinations, with guest Mihai Criveti“ — a 15 minute podcast where I discuss Large Language Models and Hallucinations with IBM Fellow and VP of Technology, Jerry Cuomo — and has also been converted to text using Speech Recognition model.

The result was formatted automatically, with only a few links added to Jerry’s Podcast, the hand-written article, and our LinkedIn profile. With further metadata enhancement, or connecting to an internet search platform for further RAG context infusion, links could also be generated automatically.

Subscribe to my YouTube Channel: Practical Cloud and AI with Mihai — for more videos and discussions on AI, RAG, reducing Hallucinations and more. I’ll do a deep dive there shortly!

Executive Summary

In “The Art of AI,” Episode 4, hosts Jerry Cuomo and Mihai Criveti discuss the challenges posed by hallucinations in Large Language Models (LLMs). As a solution, Criveti suggests methods such as few-shot prompting and retrieval augmented generation (RAG), which involve providing the model with relevant context and information to improve the accuracy of its responses.

Additionally, they emphasize the importance of attribution and transparency in AI decision-making processes.

Finally, they mention related resources, such as Criveti’s paper on Understanding Generative AI, and invite listeners to continue exploring these topics further. Key takeaways include the potential benefits of RAG and the significance of developing explainable AI systems.

Participant Names

The key participants in this episode of The Art of AI are:

Jerry Cuomo — host, IBM Fellow, and VP of Technology
Mihai Criveti — guest, STSM & Principle Architect, OIC Vice Chair — Technology, ScribeFlow GenAI Lead, and Podcast Host.

Action Items

Here are some potential actions or tasks for listeners based on the contents of the podcast:

Read Mihai Criveti’s article on Medium, “Understanding GenAI Large Language Model Limitations, and How Retrieval Augmented Generation Can Help,” which is mentioned in the podcast.
Research and learn more about Large Language Models (LLMs), their limitations, and how retrieval-augmented generation can help address some of these limitations.
Consider incorporating the approach of providing contextual information and relevant databases or knowledge bases when interacting with LLMs to improve the quality and accuracy of their responses.
Explore the idea of attribute able AI and understand how it works.
Look into the possibility of Fine-Tuning LLMs and compare the benefits and drawbacks of doing so versus other methods of providing context.
Investigate the cost and environmental impact of training and utilizing LLMs and consider whether there are ways to mitigate these factors.

Relevant Questions and Answers

All the questions (and answers) below have been generated by AI, no manual prompting was used.

How can you get more information

Mihai Criveti’s article “Understanding GenAI Large Language Model Limitations, and How Retrieval Augmented Generation Can Help” provides insight into utilizing large language models (LLMs) effectively and responsible.

He highlights three essential strategies:

prompt engineering, where instructions are given to the model along with relevant context;
retrieval augmented generation, involving the integration of a knowledge base or database search into the model’s prompt;
and vector databases, which enable flexible searches based on similarity rather than exact keywords.

These methods aim to minimize hallucinations — the model’s inclination to create responses without proper context or basis in reality — and enhance the accuracy of the produced outputs by incorporating context.

Additionally, they offer advantages like cost efficiency and increased speed compared to entirely retraining the model or fine-tuning it. By implementing these techniques, it is feasible to improve the performance of LLMs and move toward Explainable AI.

What is a large language model

A large language model (LLM) is a type of artificial intelligence (AI) model that processes natural language text. It is trained on vast quantities of text data and uses machine learning algorithms to learn patterns and relationships within the language. These models can generate human-like text, complete sentences, summarize information, and answer questions.

However, despite their capabilities, LLMs have limitations. They can produce erroneous or misleading outputs, known as hallucinations, due to their lack of reasoning abilities and limited context awareness. Additionally, they have constraints regarding the amount of input data they can handle, leading to reduced quality responses beyond a certain threshold.

To address these limitations, techniques such as prompt engineering, retrieval-augmented generation, and vector databases can enhance the accuracy and relevance of LLM responses.

Prompt engineering involves providing specific instructions to guide the model’s output, while retrieval-augmented generation incorporates external knowledge sources to expand the model’s context.

Vector databases utilize vector space representations of words to identify related documents and improve the efficiency of searches. By employing these strategies, organizations can leverage LLMs more effectively and responsibly in various applications, including customer service, writing assistance, and language translation.

What are some of its limitations

Some limitations of large language models (LLMs) include:

1. Models only act on language: They cannot handle PDF files, Word documents, binary files, images, or anything other than text.

2. Limited domain expertise: Although LLMs can generate text similar to human language, they lack domain-specific knowledge; therefore, their responses may need to be corrected regarding specialized domains such as medicine, law, etc.

3. No reasoning or deduction abilities: LLMs lack reasoning skills and cannot perform mathematical calculations or operations. They rely solely on statistical patterns identified during training.

4. No direct access to external sources: LLMs do not directly interact with the Internet or utilize outside data beyond what was included in their initial training dataset. As a result, they are oblivious to current events or fresh facts not incorporated into their instructional material.

5. Hallucination generations: Due to the absence of context, the model may produce erroneous outputs known as hallucinations. These hallucinations can range from grammatical mistakes to illogical statements or blatantly wrong assertions.

6. Factual inaccuracies: Sometimes, LLMs offer replies that appear believable but contain inaccurate details. They draw upon the information given during training, which may include inaccuracies present in the source materials. Therefore, factual accuracy cannot be guaranteed.

7. Contextual misunderstandings: Without proper guidance, LLMs risk producing responses disconnected from the situation due to their restricted capacity for comprehending context. Clear directions must be provided through prompt engineering to address this limitation.

8. Memory restrictions: Typically, LLMs can accept around 4,096 tokens or 3,000 words as input before performance declines. Consequently, offering extensive context becomes impractical without careful selection of essential information.

In summary, despite their impressive capabilities, LLMs face various constraints. To enhance their effectiveness, strategies like prompt engineering, retrieval-augmented generation, and thoughtful context injection become vital.

What can cause them to produce hallucinations

What causes hallucinations in Large Language Models (LLMs)?

Hallucinations in LLMs are caused by several factors, including:

Lack of Context: Sometimes, the model lacks context, leading to misunderstandings and inaccurate responses.

Limited Training Data: LLMs are trained on vast amounts of data; however, the quality and accuracy of this data can vary greatly. As a result, biases and inconsistencies can arise in the training data, causing hallucinations.

Prediction-Based Model: LLMs rely on predictions rather than actual learning, resulting in inaccuracies and hallucinations.

Conversational Memory Limits: There are restrictions to the conversational memory of LLMs since they can only store a limited amount of information from earlier interactions. Therefore, their ability to remember crucial details may be restricted, producing hallucinations.

Poor Quality Prompts: Providing poorly constructed prompts can lead to low-quality replies from LLMs, potentially containing hallucinations.

Misunderstanding Questions: LLMs occasionally comprehend queries incorrectly, triggering erroneous and hallucinatory responses.

Overfitting: Due to overfitting, LLMs might memorize certain patterns in the training data rather than gain generalizable knowledge, contributing to hallucinations.

What can be done to mitigate this problem

What can be done to alleviate this problem?’ refers to the issue of hallucinations in Large Language Models (LLMs). According to the conversation, hallucinations in LLMs occur due to their limitation in reasoning, learning, and utilizing external sources. Instead, they rely solely on producing coherent and fluent texts based on patterns recognized within their training data. As a result, the model may create contradictory or wrong statements, such as claiming that a tomato is both a fruit and a vegetable.

To address this challenge, several methods can be employed to improve the quality of answers provided by LLMs. One approach involves supplying clear and concise prompts, including sufficient background information to assist the model in comprehending the situation.

Another strategy entails incorporating retrieval-augmented generation, which enables searching a knowledge base for pertinent articles and integrating them into the dialogue to enhance the model’s reaction further. Providing instances and restricting inputs to 4, 096 tokens or fewer is recommended since the quality deteriorates beyond that point.

Furthermore, fine-tuning the model is feasible, although it can be costly and yield mixed outcomes, particularly when dealing with real-time data. Therefore, offering adequate context in each cue remains essential.

Additionally, it’s crucial to remember that attributing and explaining AI remain vital steps toward reducing hallucinations. By furnishing users with precise details regarding the source of the solution, such as a particular knowledge base document, PDF file, or database entry, a starting point for explainable AI and content grounding can be established, contributing to minimizing hallucinations significantly.

How can we get more context from the model

In summary, large language models (LLMs) have several limitations, such as not being able to reason, use logic, or access external information. They are also unable to comprehend context, leading to hallucinations or incorrect outputs.

To mitigate these issues, providing context through strategies such as writing clear prompts, injecting context into the prompt, utilizing retrieval-augmented generation, and fine-tuning models can improve response quality. However, there are tradeoffs between quality and cost, and finding the optimal balance is crucial.

What are some examples of this

Some instances of hallucinations in extensive language models (LLMs) include producing erroneous or misleading outputs, such as claiming that the sky is purple or that a tomato is both a fruit and a vegetable. Another illustration is a reaction that negates the request, such as getting a bad review after asking for a favorable assessment of a book.

Additionally, LLMs may offer factually wrong replies, such as stating that someone has authored 173 books when they have only written 173 articles or patents. To address these concerns, methods including prompt design, retrieval-augmented production, and vector databases can be employed.

By giving precise directions and enough background, prompt engineering may assist in ensuring that the model produces truthful and valuable responses.

Why is this important

The Importance of Understanding the Limitations of Large Language Models

In the field of artificial intelligence, large language models (LLMs) have gained significant attention due to their ability to generate human-like language outputs. However, these models are not without their limitations, and it is crucial to recognize these limitations to avoid misusing or misunderstanding their capabilities. In this essay, we will explore the importance of understanding the limitations of LLMs and how doing so can lead to more responsible and effective usage of these models.

Firstly, it is essential to comprehend what LLMs truly are. As stated by Mihai Criveti, a renowned expert in the field, LLMs are merely models that act on language and nothing else. They cannot process images, videos, or any other forms of media; they solely operate on text inputs.

Additionally, these models lack reasoning abilities and do not utilize external sources such as the internet during their processing. Instead, they rely heavily on statistical predictions based on patterns found within the training data. Therefore, it is inaccurate to assume that LLMs possess intelligent thought processes or can engage in independent learning.

Another critical limitation of LLMs is their tendency to produce hallucinations, which refer to outputs that are either incorrect or utterly nonsensical. These hallucinations occur due to the model’s reliance on statistical probabilities rather than genuine understanding.

As highlighted by Criveti, LLMs frequently fail to grasp context, leading to bizarre statements such as “the sky is purple.” To mitigate this challenge, proper contextualization and priming are vital. By supplying adequate background information and specifying the desired topic, users can increase the likelihood of receiving relevant and accurate responses from LLMs.

Furthermore, it is crucial to acknowledge that LLMs do not inherently learn from interactions. Although some platforms may claim that their models adapt to user input, this statement is largely misleading. In reality, the model’s performance may deteriorate over time since it relies on pre-existing statistics and lacks true learning capacities.

Conversely, humans possess the unique ability to develop new connections and comprehend novel ideas through experience and education. Thus, it is unfounded to compare LLMs to human cognition.

To address the challenges posed by LLMs’ limitations, innovative techniques such as few-shot prompting and retrieval augmented generation (RAG) have emerged. RAG, specifically, employs vector databases to enhance conventional searches, allowing for more flexible and inclusive queries. By integrating tools like RAG, users can bolster the accuracy and relevancy of LLMs’ outputs.

What other approaches could be used

Other approaches that could be used to improve the quality of responses generated by large language models include:

Prompt Engineering: Providing clear and concise prompts that contain sufficient context can help elicit better responses from the model. This includes breaking down complicated tasks into smaller parts and providing relevant information along with the query.

Few-Shot Learning: Providing multiple instances or illustrations that demonstrate the desired outcome can assist the model in producing superior replies. This strategy enables the model to learn from fewer examples and deliver appropriate outcomes.

Vector Database: Using vector databases allows for quicker and more effective searches of vast volumes of data. It converts papers into numerical representations, enabling fuzzy searches and the identification of comparable vectors rather than precise terms. As a result, vector databases may enhance the precision and pertinence of retrieved material.

Conversational Memory: By incorporating prior dialogue history into the chat, conversational memory enables the model to comprehend the context of a discussion better. Doing so improves the likelihood of receiving suitable and cohesive replies from the model.

Fine-Tuning: Although costly and resource-intensive, fine-tuning models can improve their accuracy and effectiveness on particular jobs. With adequate funds and computing power, fine-tuning can boost performance.

Contextual Understanding: Ensuring that the model possesses contextual awareness and understands the intent behind a request is crucial. One method to achieve this is to specify that the model won’t create anything it isn’t familiar with. Instead, it would rely on reliable sources.

Retrival Augemented Generation: RAG combines retrieval and generation capabilities to produce higher-quality responses. It involves inserting pertinent articles or excerpts from a database into the model’s prompt to offer context and guide the creation of improved responses.

Knowledge Base Integration: Combining a big language model with a structured repository of information can increase its capacity to produce insightful and exact answers. The model can access relevant details and offer users more thorough and informed responses by integrating a knowledge base.

What is the benefit of vector databases

Benefits of Vector Databases: Improved Search Functionality and Explainable AI

Vector databases offer various benefits over traditional SQL databases, particularly when utilized in conjunction with large language models (LLMs) for natural language processing tasks. One significant advantage is the ability to perform efficient similarity searches, which enhances the effectiveness of the model in generating relevant and accurate responses.

Unlike traditional databases that rely solely on exact keyword matching, vector databases employ vectorization techniques to transform texts into numerical representations, enabling the identification of semantically related concepts. As a result, queries yield more precise and diverse outcomes, better addressing users’ needs and minimizing hallucinations typically associated with LLMs.

Moreover, incorporating vector databases contributes to developing explainable AI systems since it allows for tracing back the sources used to inform generated responses, promoting transparency and trustworthiness in AI interactions. By integrating vector databases into their workflows, developers can create more reliable and accountable language models, leading to improved overall performance in natural language processing applications.

What is the last word of the podcast

The final statement of the provided material is: “See you again on an upcoming episode.” Therefore, the ultimate word of the podcast is “episode”.

What is the name of the podcast Mihai was on

The name of the podcast is The Art of AI.