Unleashing the Potential of LLMs in Practical Applications: Addressing the Challenge of Hallucinations

5 min readSep 16, 2023

What does Hallucination mean in the context of Human and Computer ?

Hallucination, as a concept, for a human being entails experiencing sensory stimuli, such as sounds or visions, that do not align with external objective reality.

Now, let’s delve into the world of computer programs, particularly Language Model AI like GPT. Unlike humans, computers lack sensory organs like eyes and ears, but their input is primarily in the form of text — words and sentences. When Language Models endeavor to construct responses to our queries using the vast expanse of data they have been trained on, they occasionally stumble. This results in them furnishing answers that are either factually inaccurate or entirely fictitious. This phenomenon is what we refer as ‘hallucination’ in the context of LLM’s.

To clarify, the term ‘hallucination’ in this context serves as a metaphor for the occurrence of incorrect textual outputs generated by Language Models. These errors can be attributed to the limitations of the training data and the methodology employed in training these models. It’s essential to recognize that Language Models are intricate pre-trained programs designed to augment human capabilities and should not be mistaken for sentient entities. 😊

Large language models are simply predicting the next word in very sophisticated manner. LLM’s are so good at mimicking hence, what they produce looks very convincing when presented in natural human language. Main difference between what LLM’s and humans produce while answering a query is LLM’s lack logical reasoning and facts checking.

Challanges

When we contemplate deploying Language Models (LLMs) in real-world scenarios such as chatbots, agent assistance for organizations and financial institutions, a substantial obstacle looms in the form of ‘hallucinations.’ These hallucinations pose a significant hurdle primarily because LLMs do not possess access to the most current and up-to-date information about your latest products or developments. Consequently, when these models are harnessed for specific use cases where precision and context are paramount, they often generate responses that deviate from the desired accuracy due to the lack of accurate context.

To harness their capabilities effectively, it is essential to devise strategies and solutions that mitigate hallucinations and ensure that LLMs deliver accurate and contextually relevant results in diverse real-world scenarios

RAG (Retrieval-Augmented Generation) with Vector Databases

This is one of the most efficient approach to reduce hallucinations in LLM’ s. This approach allow us to harness the power of contextual information we have and stands out as one of the most common strategies for mitigating hallucinations in LLMs. At the core of this approach lies RAG (Retriever-Augmented Generation). RAG’s essential function revolves around utilization of user input prompts to access external contextual information stored within a dedicated data repository which can be a vector database. All incoming user queries are searched within this DB to find suitable contextual information which an be passed on to LLM along with user query, ultimately elevating the quality and precision of generated content from LLM’s.

Implementation

Lets quickly delve into implementation around how own context is embedded into vector database and subsequently how to amalgamate relevant data from vector db with user query to build efficient prompt for LLM’s to make sure it responds with most relevant product information and reduce hallucination.

We are using langchain framework for below example. This article more of a focused around approach and not on coding details.

The provided notebook code serves the purpose of loading custom content and segmenting it into discrete chunks, preparing them for integration into the vector database (embedings)

Using SentensetransformerEmbeddings MiniLM model for encoding.

Using Pinecone as vector DB for this example. Below snippet shows how we can embed our chunked contents in vector DB

After successful embedding we should be able to confirm our chunks in pinecone DB index

We have now reached the stage where we can proceed to query the vector database with the user’s input and identify the most closely matching chunks mostly using cosine similarity. Matching mechanism is dependent on vector DB and can be different in other cases.

The vector database’s response will provide us with the relevant chunks corresponding to the user’s query. Subsequently, in the next step, these identified chunks will be presented to the Language Models (LLMs). This strategic approach ensures that the LLMs are equipped with the optimal context for responding to the user’s query, thereby significantly enhancing the probability of reducing hallucinations and delivering more accurate responses.

Results :

Summary

In conclusion, we have explored an effective approach to mitigate incorrect outcomes and hallucinations when working with Language Models (LLMs). By incorporating our own contextual data and adjusting the input prompt, we can fine-tune the results to align with specific requirements. It’s important to note that the results obtained are highly contingent on the prompt used. By modifying the prompt, we can tailor the LLM’s responses to match the desired persona or context.

A crucial consideration in this process is the temperature setting for the LLMs. Lower temperature settings, close to zero, enhance the likelihood of obtaining results closely aligned with our own context, reducing the potential for hallucination. Conversely, higher temperature settings introduce more variability into the responses, which may increase the chances of inaccuracies or hallucinations

Unleashing the Potential of LLMs in Practical Applications: Addressing the Challenge of Hallucinations

Challanges

RAG (Retrieval-Augmented Generation) with Vector Databases

Implementation

Written by ASHPAK MULANI