Retrieval-Augmented Generation (RAG): Control Your Model’s Knowledge and… Hallucinations!

Super-sized language models are really good at what they do. This is something we all know from constantly hearing about it on A.I. related news.

This model probably knows more than you and even writes better than you.

Admittedly, it is really big (175 Billion parameters) and it has seen a lot of text (~500 Billion words). But when we say “parameters”, these are just numbers we’re talking about; just matrices upon matrices of floating point values. It just knows, somehow. It’s not even like it has a database or a lookup table of information.

But maybe it should. Maybe it should have a database of knowledge like a normal “machine” would.

The thing is, these language models know so much, but it tends to not know what it knows. Developers that have worked with models like GPT must’ve encountered this problem in one way or another:

these language models hallucinate.

How can you blame it? Like I said, GPT doesn’t have a lookup table of information for each person in the world. The training of GPT does implicitly (barely) ask it to generate true statements, but its main task is to generate realistic statements. And it sure does generate realistic statements very well.

Any person that doesn’t know who Kyunghyun Cho is wouldn’t know if the above statements are true or not.

How is this thing of matrices of numbers supposed to know that?

Parametric vs. Non-Parametric Knowledge

GPT is an example of a model that utilizes only parametric knowledge. This means that the entirety of its knowledge can be found within the weight values in its parameters that constitute the model.

non-parametric knowledge is a retrieval-based memory.