Retrieval-Augmented Generation: Why naive RAG is not enough (and some ways to improve it)
“Naive” RAG is a term in vogue used to define the kind of RAG systems you will find on starter tutorial.
That is to say an implementation of the Retrieval-Augmented Generation technique:
- using a single, vector based, retriever,
- using the user query (or condensed question) for the similarity search,
- computing the embeddings on the associated text,
- not filtering results based on metadata.
Let’s get into the details:
Semantic search is not perfect
In naive RAG, you expect the chunk of text containing the answer to a question to be the most semantically similar chunk on your vector-database. The issue is there is no guarantee it is the case and this gets worse as the size of the knowledge base increases.
ie: you have two chunks:
- one that ask the same question than the user’s one, (chunk a)
- one that answer it. (chunk b)
With a semantic search, the chunk a will have a greater similarity than the chunk b.
There is an “artificial” distance between the embedding of the query and the embedding of the chunk containing the response, due to their difference of nature (question vs answer).
Some solutions exists to reduce this artificial distance
As for now, I am aware of two kind of solutions for this issue, one at ingestion time and the other at query time.
At ingestion time, one technique that can be used if, for a given chunk, ask an LLM to determine what questions this chunk is answering to. Then the embeddings will be computed according to those questions instead of using the text of the chunk.
At query time, the HyDE (Hypothetical Document Embedding) technique is another solution. Instead of using directly the user query for the semantic search, a first LLM call is made to try to imagine a chunk that would answer the question, even if the LLM hallucinate, the generated document should be semantically close to a real one. Then it is this hypothetical document that is embedded and used to search for similar chunks.
Issues linked to chunking documents
Even now where LLM have large context windows, we need to split an original document into smaller pieces often called chunks. The RAG technique won’t use only one of those chunks to build an answer but retrieve the top-k results.
Now what happen when the answer of your query is on two distinct chunks of the same document? Let’s say one that contains the name of the A380 plane and one that say it’s cruise altitude?
That’s the issue, the semantic search might not fetch those two chunks because other ones, close but not answering the query, might looks more similar to what is looked for.
One solution would be to add a context to chunks…
… to remember elements about the document. ie: for a knowledge base about technical data for planes, you would add the name of the plane into each chunk. This is already done by some frameworks like LlamaIndex that include metadata into both the text used for embedding and the one used for generation.
Issues linked to having too much documents
The larger your knowledge base, the greater the risk of having chunks that are semantically close to what’s being searched for but don’t answer the question!
The solution for that would be to pre-filter your semantic search…
… based on full document content. (ie: only includes chunks inside documents talking about the A380)
(You can see links to two articles I wrote about it)
On a personal note, I recently participated to a webinar about how to use LLM to construct SQL queries (text-to-sql). An interesting point was the use of an hybrid search for keywords that allowed to create filters to reduce the knowledge domain.
ie: if the query is about what is eaten in Iceland, the hybrid search would retrieve a document describing that “Iceland” is a keyword that can be present in the “country” metadata field. It helps filtering the knowledge base and then decrease the risks of having only non-relevant chunks retrieved.
To conclude…
Semantic based RAG is far from being perfect but there exists solutions to improve it!
That’s all folks!
Feel free to contact me if you are curious or my employer (French company) if you had like to work together.