Hallucinations: Are LLMs Making Stuff Up?

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

6 min readMay 9, 2024

LLMs (Large Language Models) are the current stars of the AI show, dazzling with their text generation, QA prowess, and chatbot magic. But are we putting too much faith in AI? After all, even the smartest algorithms can have their off days — or, dare I say it, moments of sheer hallucination!

In case you are not familiar with what an LLM is or how it works, here's a layman's analogy that will give you a basic idea.

What is a Language Model?

Imagine you’ve got this parrot, let’s name it Buddy (or whatever strikes your fancy). Buddy’s your winged buddy, always hanging out, listening to your every word. At first, it’s just mimicking your “Good morning,” but soon it’s got your favorite color down, your go-to grub, even your top football team! Fast forward a bit, and Buddy’s practically finishing your sentences for you.

You start, “It’s time for me to go to the…” and BAM! Buddy’s like, “Gym.” It’s like having your own personal mind-reader, except it’s just been paying close attention to everything you’ve been blurting out in its presence. So you fed in information.

This is exactly what a language module does. It has a vocabulary of words and it creates a probabilistic distribution on each word, selecting the best text. Fun fact! It will take the feedback from the generated output and refine the text it generates with every iteration. All the more, it has memoization capabilities. Its job is to process and generate text, whether it’s answering questions, providing explanations, generating creative content, or engaging in conversation.

A brief about Language Model Architecture: At the heart of a robust language model lies its intricate neural network framework, often centered around the innovative transformer architecture. This sophisticated design comprises layers brimming with attention mechanisms and feedforward neural networks, orchestrating a symphony of data processing prowess.

Within this framework, we encounter the indispensable duo of encoders and decoders, the backbone of sequence-to-sequence models. These components collaborate seamlessly to navigate the complexities of input text, deciphering its nuances and weaving together output text that resonates with context and coherence. As the encoder dissects the input text, extracting its salient features, the decoder steps in, harnessing these insights to craft output text tailored to the task in hand, be it translation or text generation.

Exploring the Pitfalls: Hallucinations

Now that we’ve wrapped our minds around what a Language Model is, let’s dive into Large Language Modules. A large language model is a more advanced and powerful version of a language model, often characterized by its extensive scale, complex architecture, and superior performance, typically trained on massive datasets.

But hold onto your hats! I’m about to drop three mind-bending facts, and your task is to spot a sneaky similarity between them. Here we go:

Most Body Heat Is Lost Through the Head.
You can see The Great Wall of China from space.
The James Webb Telescope took the very first picture of an exoplanet outside our solar system.

Here’s the twist: all these “facts” are as fictional as unicorns in a rainbow meadow! They are all false. As for generated text, even the most convincing tales can sometimes be as wobbly as a Jenga tower. Large language models aren’t always spot-on. In fact, The last statement on James Web Telescope was hallucinated by Google LLM Bard.

This is exactly what LLM hallucinations are; generated data which is not — correct or factual, or grounded. Data is called ungrounded if it is not supported by the document it was trained on.

Hallucinations may range from minor inconsistencies to majorly fake statements. LLMs just look for the next probable word, and let's just face it, being ‘artificially intelligent’ makes it lack common sense. It may also be due to a misleading input prompt made by the user.

LLMs in AI might hallucinate certain facts

Different Granularities of Hallucination range from :

> Lowest level: Sentence: Bad construction or two contradictory sentences.

The food in the restaurant is good. I think you should not visit it.

> Factual: Errorenous Facts.

Kolkata is the Capital of India .

> Highest Level: Nonsensical: Has no concrete meaning.

Paris Hilton is the capital of France. (Couldn’t make out the difference between a name and a place).

Now that we know what LLM Hallucination is let's get to the question

Why do LLMs Hallucinate?

Hallucinations can be of various kinds, semantic incoherence, conceptual absurdity, surreal scenarios, or illogical events. But how does a language model build up its fairytale?

One of the reasons can be poor data quality. The LLM is trained on large corpora of data. That fed information itself might contain false information or inconsistencies. For example, if we train a model on all of the data available on Reddit, are all the information on Reddit 100% accurate? Definitely not!

Generation Method of training may also cause models to hallucinate. Generation methods include maximum likelihood/ reinforcement learning which induces trade-offs between fluency and rigidness, facts and creativity. In case the training temperature(randomness or measure of creative freedom of LLM) is high, the probability distribution of the words becomes almost similar (flat curve) and hence we can play around with creativity ( or choose any next word). If the temperature is low, the distribution remains varied and the most suitable text is generated, hence less creativity and randomness.

#HIGH TEMPERATURE : 0.73
I have a pet ___________

#almost similar probabilities
words: red(0.38)  cat(0.31)  panda(0.31)

first iteration : takes red

I have a pet red ______

#revised probailities of previously selected word 'red'
words: cat(0.45) panda(0.55)

I have a pet red panda -- creative output

#LOW TEMPERATURE : 0.25
I have a pet ___________

#Varied probabilities
words: red(0.1)  cat(0.52)  panda(0.38)

first iteration : takes cat

I have a pet cat -- expected output

This is how, sometimes creative models may create hallucinations.

Hallucinations may also be induced when the user writes poor prompts and the input context is unclear. For example, if I ask my LLM,

Can mice speak in English?

The Model will probably answer: “English fluency isn’t exactly in their skill set.”

But hey, I forgot to write the context. I was talking about Mickey Mouse, the Disney character. This is how unclear prompts may cause miscommunications.

Hallucinations may blend into human ideas so well that we can even start believing in them. And these scenarios are inevitable. There is no known method yet that eliminates hallucinations in LLMs entirely.

However, we do have ways to reduce hallucinations…

To kickstart our journey into taming the beast, we need to give clear-cut and to-the-point prompts to the LLM — as crisp and clear as a freshly pressed suit! We can opt for in-context prompts, guiding the model precisely to our desired outcome, or furnish it with examples to illuminate the path. For those intricate tasks, we can even employ the multi-shot technique, breaking down the query into bite-sized chunks for easier digestion by our LLM.

We can apply mitigation strategies, to control the wide parameters of the LLM. Take, for instance, the art of temperature control — dialing down the heat to curb randomness, much like sipping iced tea on a scorching day!

Using RAG(Retrieval Augmented Generation) can reduce the chances of hallucinations. It is a framework designed to infuse an LLM with trusted data, fresh from a company’s own sources, to have it generate more accurate and relevant responses.

One of the most effective ways to combat GenAI and RAG hallucinations is by using the most advanced RAG tool, one that retrieves/augments both structured AND unstructured data from a company’s own private data sources.

This approach, called GenAI Data Fusion, accesses the structured data of a single business entity — customer, vendor, or order — from enterprise systems based on the concept of data products.

Hence, Hallucinations in LLMs remain a very interesting topic and an unsolved mystery, attracting a large number of AI enthusiasts under its umbrella of research possibilities. Let the Quest Begin!

References
“Why Large Language Models Hallucinate? “ — Matin Keene, IBM Technologies
“RAG Hallucinations” — Iris Zarecki

Hallucinations: Are LLMs Making Stuff Up?

Exploring the Pitfalls: Hallucinations

This is exactly what LLM hallucinations are; generated data which is not — correct or factual, or grounded. Data is called ungrounded if it is not supported by the document it was trained on.

Why do LLMs Hallucinate?

Written by Manjari Nandi Majumdar