LLMs Know More Than They Show

Vishal Rajput
AIGuys
Published in
10 min readOct 21, 2024

--

Recently I came across this very interesting paper that talks in detail about the hallucination problem of the LLMs. This is a brand new paper from Google and Apple, where they looked into the internal LLMs to understand the nature of hallucinations. They showed that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies. They also reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model’s internal perspective, which can guide future research on enhancing error analysis and mitigation.

So, without further ado, let’s begin.

Table of Content

  • Breaking Down The Hallucination Problem
  • Can there be a theoretical understanding of the AI model’s internal workings?
  • Using Sparse Autoencoder To Break Internal Representations of LLMs
  • Grokking Is Another Technique To Know What Is Learned And What Is Memorized
  • How To Detect Different Hallucination Errors?
  • Conclusion
Photo by Josh Withers on Unsplash

Breaking Down The…

--

--

AIGuys
AIGuys

Published in AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.