Controllable Agent for Complex RAG Tasks

7 min readAug 8, 2024

Introduction

Nowadays, with the rise of large language models, everyone wants to talk with their data and ask questions about it. As a result, Retrieval-Augmented Generation (RAG) has become very popular. The standard RAG pipeline consists of data ingestion and retrieval (with many techniques to optimize these steps for your specific problem and data), followed by feeding a user query with the retrieved information to the LLM to generate the response.

However, in some cases, both the data and the questions we want to ask are not trivial. These situations require a more sophisticated agent with reasoning capabilities to go through several steps to solve the question. In this article, I will show you how I tackled this problem, using the first book of Harry Potter as a use case.

Understanding RAG and Agents

We’ve talked a bit about what RAG is, but what are Agents in the field of LLMs?

LLM agents are advanced AI systems designed for creating complex text that needs sequential reasoning. They can think ahead, remember past conversations, and use different tools to adjust their responses based on the situation and style needed.

Limitations of Semantic Similarity in Retrieval

Traditional RAG systems often rely on semantic similarity for retrieval. This approach measures how close the meanings of two pieces of text are to each other, typically using vector representations and similarity scores. While effective for simple queries, it falls short for complex tasks that require multi-step reasoning or understanding of broader context. Semantic similarity might retrieve relevant individual chunks but struggles with questions needing information synthesis or logical inference across multiple sources.

The Challenge with Regular Agents

The problem with regular agents is the trade-off between the level of autonomy we give them and the amount of control we have. The alternative is to construct our own workflow.

When using a regular agent:
- No control over when it uses its tools or in what order.
- No control over the conclusions it derives from using the tools.
- Harder to trace hallucinations/usage of pre-trained knowledge.

When using workflow engineering:
- Defining your specific path of how to tackle the problem.
- Full control over each step.
- Requires a tailored solution that might be time-consuming and complicated to design as the problem becomes more difficult.

Our Mission: Creating a Controllable Agent for Complex RAG Tasks

Now that we understand RAG and Agents, let’s start with our mission of creating an agent that can solve complicated RAG tasks and that we can control.

In this case, we would want to have three kinds of vector stores:
1. Regular vector store based on the book chunks
2. Vector store that contains chapter summaries for higher granularity information
3. Vector store that contains quotes from the book for specific high-resolution information

A naive flow engineering for agent that validates a RAG pipeline:

1. The process starts by retrieving context relevant to the given question.
2. This context is then filtered to keep only the most relevant content.
3. Using this refined context, the agent attempts to answer the question.
4. The answer is evaluated for relevance and potential hallucinations:
— If the answer is relevant and not a hallucination, the process ends successfully.
— If the answer is deemed a hallucination but potentially useful, the agent goes back to retrieve more context.
— If the answer is not relevant or useful, the question is rewritten.
5. The rewritten question is fed back into the context retrieval step, and the process repeats until a satisfactory answer is produced.

This could have been a nice solution, but it is not enough for complex questions.

Example: Complex Question Solving

Let’s look at an example of a complex question that needs reasoning:
“How did the protagonist defeat the villain’s assistant?”

To solve this question, the following steps are necessary:
1. Identify the protagonist of the plot.
2. Identify the villain.
3. Identify the villain’s assistant.
4. Search for confrontations or interactions between the protagonist and the villain.
5. Deduce the reason that led the protagonist to defeat the assistant.

Required Capabilities

Hence, the capabilities that we may need in our solution are:
1. Tools
2. Reasoning
3. Flow
4. Control
5. Verification
6. Stop Condition
7. Evaluation

Implementation Components

The tools that we may use in our case should consist of retrieval and answering. Breaking the previous graph into several subgraphs will serve as tools of the new agent graph.

For reasoning & flow, we may need the following components:

Planner: Given a question, constructs a plan of steps that we need to perform to get to the final solution.
2. A component that breaks down the plan steps to either retrieve or answer tasks.
3. A task handler component that chooses which tool to use at each step.
4. A replan component that updates the plan online according to the previous steps done and the current information that we gathered and have at each time.
5. The retrieval and answer tools that are both constructed as small agents themselves, the way we want to monitor them (cleaning the retrieval information, verifying it is grounded on the context, hallucination checks).
6. (Optional) Question Anonymization component: To generate a general plan without any biases based on prior knowledge of any LLM.

Stop Condition

How to determine when the process is done? There are several alternatives:
- Check at every re-plan visit if the question can already be answered based on the aggregated information so far.
- Keep collecting relevant data until the process reaches saturation (the amount of new interesting info is less than a certain threshold).
- Limit the graph recursion by a predefined number of iterations.

The full agent logic

The controllable agent for complex RAG tasks follows a sophisticated, multi-step process:

1. The process begins by anonymizing the input question to reduce potential biases.

2. A planner then creates a general plan to answer the anonymized question.

3. The plan is de-anonymized to reintroduce specific context.

4. The plan is broken down into retrieve or answer tasks.

5. A task handler decides which tool to use based on the nature of each task:
Retrieve book chunks
— Retrieve book quotes
— Retrieve summaries
— Answer the question directly

6. For retrieval tasks, the system fetches the relevant information and then filters to keep only the most relevant content, ensuring it’s grounded in the original context.

7. If the chosen tool is to answer, the system attempts to provide an answer.

8. The answer is checked for hallucinations and whether it’s grounded in the provided context.

9. If the question cannot be answered yet, or if the answer isn’t satisfactory, the system goes into a replanning phase.

10. The replan step evaluates if the question can be answered with the current information or if more retrieval is needed.

11. If the question can be answered, the system moves to get the final answer.

12. The final answer is checked once more for hallucinations and to ensure it’s grounded in the context.

13. If the final answer passes these checks, the process ends successfully.

This iterative and multi-faceted approach allows the agent to handle complex queries by breaking them down, retrieving relevant information, and continuously refining its approach until a satisfactory, well-grounded answer is produced.

Evaluation

Since this is a RAG task, we can evaluate it similarly to other RAG tasks. I chose to evaluate based on a custom benchmark of a QA bank, using the following metrics:

- Answer correctness: Measures whether the generated answer is factually correct.
- Faithfulness: Measures how well the generated answer is supported by the retrieved information.
- Answer relevancy: Measures how relevant the generated answer is to the question.
- Answer similarity: Measures the semantic similarity between the generated answer and the ground truth answer.

Conclusion

By implementing this controllable agent for complex RAG tasks, we can maintain a balance between autonomy and control. This approach allows for more accurate and traceable responses to sophisticated queries, opening up new possibilities for interacting with and extracting insights from large bodies of text, such as novels or technical documentation.

Demo

If you found this article informative and valuable, I’d greatly appreciate your support:

Give it a few claps 👏 on Medium to help others discover this content (did you know you can clap up to 50 times?). Your claps will help spread the knowledge to more readers.
Share it with your network of AI enthusiasts and professionals.
Connect with me on LinkedIn

Your engagement helps foster a community of knowledge-sharing in the rapidly evolving field of AI and language models.