Forget RAG: Embrace agent design for a more intelligent grounded ChatGPT!

Published in

Data Science at Microsoft

6 min readNov 18, 2023

The Retrieval Augmented Generation (RAG) design pattern has been commonly used to develop a grounded ChatGPT in a specific data domain. However, the focus has primarily been on improving the efficiency of the retrieval tool such as embedding search, hybrid search, and fine-tuning embedding rather than intelligent search. This article introduces a new approach inspired by human research methods that involve multiple search techniques, observing interim results, refining, and retrying in a multi-step process before providing a response. By utilizing intelligent agent design, this article proposes building a more intelligent and grounded ChatGPT that exceeds the limitations of traditional RAG models.

1. RAG pattern and limitations

Overview of the standard RAG Pattern implementation:

The process begins with the creation of a query from the user’s question or conversation, typically through a prompted language model (LLM). This is commonly referred to as the query rephrasing step.
This query is then dispatched to a search engine, which returns relevant knowledge (Retrieval).
The retrieved information is then enhanced with a prompt that includes the user’s question and is forwarded to the LLM (Augmentation).
Finally, the LLM responds with an answer to the user’s query (Generation).

Limitations of RAG

In the RAG pattern, Retrieval, Augmentation, and Generation are managed by separate processes. Each process might be facilitated by an LLM with a distinct prompt. However, the Generation LLM, which directly interacts with the user, often knows best what is required to answer the user’s query. The Retrieval LLM might not interpret the user’s intent in the same manner as the Generation LLM, providing it with unnecessary information that could impede its ability to respond.
Retrieval is performed once for each question, without any feedback loop from the Generation LLM. If the retrieval result is irrelevant, due to factors such as a suboptimal search query or search terms, the Generation LLM lacks a mechanism to correct this and may resort to fabricating an answer.
The context from retrieval is unchangeable once provided and cannot be expanded. For instance, if the research result suggests that further investigation is required, such as a retrieved document referring to another document that should be further retrieved, there’s no provision for this.
The RAG pattern does not support multi-step research.

2. Intelligent Agent Model

The Intelligent Agent Model draws inspiration from the human approach to research when answering a question for which immediate knowledge is lacking. In this process, one or multiple searches may be performed to gather useful information before providing a final answer. The result of each search can determine whether further investigation is required and, if so, the direction of the subsequent search. This iterative process continues until we believe we have amassed sufficient knowledge to answer, or conclude that we cannot find enough information to respond. Occasionally, the results from the research can lead to further clarification of the user’s intent and scope of the query.

To replicate this approach, the proposal is to develop an intelligent agent powered by a Language Model (LLM) that manages conversations with a user. The agent autonomously determines when it needs to conduct research using external tools, formulates one or multiple search queries, carries out the research, reviews the results, and decides whether to continue with further research or seek clarification from the user. This process persists until the agent deems itself ready to provide an answer to the user.

3. Implementation

With Azure OpenAI’s function-calling capability, it is much simpler to implement an agent that can autonomously use a search tool to locate information needed to assist with user requests. This feature alone streamlines the traditional implementation of the RAG pattern, where query rephrasing, augmentation, and generation are handled separately, as previously described.

The agent interacts with the user using the system-defined persona and objectives, while being aware of the search tool at its disposal. When the agent needs to find knowledge it doesn’t possess, it formulates a search query and signals the search engine to retrieve the required answer.

This process is not only reminiscent of human behavior but also more efficient than the RAG pattern, where knowledge retrieval is a separate process that provides information to the chatbot, irrespective of whether it’s needed or not.

To implement this capability:

Define persona, expected behavior and the tool(s) to use, when to use it.

2. Define function specification in json format with function and parameter description.

Interestingly, the parameter description for “the search query to use to search the knowledge base” plays a crucial role. It guides the LLMs to formulate a suitable search query based on what’s needed to assist the user in the conversation. Furthermore, the search query parameter can be described and constrained to adhere to specific tool formats, such as the Lucene query format. Additional parameters can also be incorporated for tasks such as filtering.

3. Implement function calling flow

At this juncture, we have developed an intelligent agent capable of conducting independent searches. However, to truly create a smart agent capable of undertaking more complex research tasks, such as multi-step and adaptive execution, we need to implement a few additional capabilities. Fortunately, this implementation process can be straightforward and simple.

Enhancements to create intelligent research agent

Adding ability for the agent to plan, act, observe and adjust in the system message as highlighted:

The added instruction says that the bot should retry and change the question if needed. Also, it says the bot should review the result of the search to guide the next search and employ a multi-step approach if needed. This assumes that there can be multiple invocations of the search tool.

As the LLM cannot repeat this process on its own, we need to manage this using application logic. We can do this by putting the entire process in a loop. The loop exits when the model is ready to give the final answer:

Here is the intelligent agent in action in a demo scenario:

The question is a comparison of a feature between two products. The feature for each product is stored in a separate document. To do this, our agent performs two search queries:

X100 vs Z200 power profile for Radio 0
X100 power profile for Radio 0

The first query is a greedy approach as the agent hoped there was a document containing the comparison. This is not the case as the search query did not return sufficient information on the X100, so it added the second query dedicated to X100.

If this were given to a classic RAG solution, it would have failed to find a good answer as it would stop at the first query.

Conclusion

Implementing the agent model can lead to substantial enhancements in grounded ChatGPT solutions. This is due to the intelligent capability of the model to test various strategies and refine its approach based on observed results.

References

Full code implementation for this article can be found here
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks arXiv:2005.11401 [cs.CL]
OpenAI’s function calling: Function calling — OpenAI API

James Nguyen is on LinkedIn.