Retrieval Augmented Generation (RAG) and its limitations

Siddhanth Biswas
2 min readJan 4, 2024

--

Photo by Andrea De Santis on Unsplash

RAG (Retrieval Augmented Generation) is a form of content generation (typically in text form) of a user query and a large language model (LLM), that has an information retriever/s (for example, a database, a website extractor, a document, etc.).

The information retriever is used to provide data that is relevant, specific and accurate for a user query. The LLM is used to synthesize the retriever data and the query to produce a more applicable and accurate response. Eureka!!

Photo by Andrew George on Unsplash

RAG seems like a genius idea!

LLM + data => Chad response

Right?

Limitations of RAG

RAG also has various limitations:

  1. Slower response times: Searching through another information system from a query consumes time depending on the type and size of database.
  2. Sensitive data: There exists the risk of displaying sensitive data to the user from the extracted information or the incorrect understanding from the LLM.
  3. Accuracy of responses: The accuracy of the responses from the information search and LLM synthesis consistently need to be evaluated. Possible causes include the existence of hallucinations in LLMs, ambiguous user queries, and unreliable information extraction. Inaccurate responses can have large scale implications in some cases.
  4. Deployability and application in real-life scenarios: There is a lack of real life scenarios where RAG can replace code-based AI systems, or can provide precise responses based on the required standards. If the accuracy of the RAG cannot meet the highly required standards, it is less likely to deployed, because of the large scale implications from inaccurate responses. There are few applicable cases of RAG due to the above reasons.
Photo by Elimende Inagella on Unsplash

How can we possibly solve these limitations?

  • Improve database search speed (how?)
  • Create a firewall to detect sensitive data.
  • Use caching to detect similar queries. This improves speed of responses and removes ambiguity from user queries.
  • Hybrid search (Vector search + Lexicon search) on databases to improve accuracy of database search.
  • Use better LLMs.

Large Langauge models will improve in the coming years. Hopefully, that will help to produce more accurate responses for Retrieval Augmented Generation as well! :)

--

--