LLMs Know More Than They Show
Recently I came across this very interesting paper that talks in detail about the hallucination problem of the LLMs. This is a brand new paper from Google and Apple, where they looked into the internal LLMs to understand the nature of hallucinations. They showed that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies. They also reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model’s internal perspective, which can guide future research on enhancing error analysis and mitigation.
So, without further ado, let’s begin.
Table of Content
- Breaking Down The Hallucination Problem
- Can there be a theoretical understanding of the AI model’s internal workings?
- Using Sparse Autoencoder To Break Internal Representations of LLMs
- Grokking Is Another Technique To Know What Is Learned And What Is Memorized
- How To Detect Different Hallucination Errors?
- Conclusion