Advanced RAG: Multi-Query Retriever Approach

Kamal Dhungana
6 min readFeb 16, 2024
RAG Multi-Query Pipeline

A Simple Retrieval-Augmented Generation (RAG) generates final results through a two-step process. First, the query is transformed into an embedding vector, which is then used to perform a similarity search against a pre-computed database of document vectors to retrieve the most relevant documents. After retrieving relevant documents, the RAG system merges their content with the original query to form a comprehensive dataset. This dataset is then processed by a llm model, which produces a contextually relevant response to the query.

The final outcome of the simple RAG method depends on how the query is written. Even minor variations in the query phrasing sometimes can lead to different outcomes. To mitigate this strong query dependency and enhance result consistency, the Multi Query Retriever method emerges as an improved solution. This method doesn’t rely on a singular set of documents retrieved for an initial query to produce the final output. Instead, it harnesses the power of diversity by retrieving multiple sets of documents based on varied interpretations of the original query. This is particularly advantageous when dealing with queries that are vague or imprecisely formulated. By casting a wider net through multiple queries, this method markedly increases the likelihood of pinpointing the most relevant and accurate answers from the vast ocean of available…

--

--

Kamal Dhungana

Data scientist with a passion for AI, Regularly blogging about LLM and OpenAI's innovations,Sharing insights for AI community growth