Photo by Drew Patrick Miller on Unsplash

E21 : Adaptive RAG

Published in

Research Papers Summarized

6 min readApr 14, 2024

Understanding the complexity of user question, to identify a suitable approach to answer the question helps improve the effectiveness and efficiency of RAG

Paper Name : Adaptive-RAG: Learning to Adapt Retrieval-Augmented
Large Language Models through Question Complexity

Paper URL : https://arxiv.org/pdf/2403.14403.pdf

Authors : Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park

Please find annotated paper here

Problem Statement :

In a real world scenario, the complexity of a user question is not always static. It can be either simple or medium or complex.
Most of the existing RAG solutions do not understand the complexity of the question before retrieval, re-ranking and generating responses. This can cause an over or under dosage of RAG depending upon the complexity of question.
For eg., based on the static RAG design choice made, a simple question that a pre-trained LLM can answer directly will have to undergo a single-step or multi-step RAG and on the other hand a complex question which would require multi-step RAG would undergo a single-step RAG thus leading to degradation in performances.
Existing RAG solutions also do not dynamically choose the type of RAG approach (no-retrieval, single-step, multi-step) to be applied based on the complexity of the question.

Solution :

Given a user query, use a classifier as the first step of a RAG solution to classify the complexity of the user query.
Based on the complexity identified, relevant RAG approach is chosen to retrieve the final response.

Approach :

As part of the data annotation and training process, three single-hop QA datasets and three multi-hop QA datasets are considered.
As part of data annotation process, a two step approach is followed.
Step 1 - Since there is no prevalent dataset that contains question and its corresponding complexity as label, the authors take samples (400 from each dataset) and use the approaches — no retrieval, single step and multi-step approach to generate the labels.
The complexity of the question is labeled as simple, medium and complex.
If a question is answered correctly by a no-retrieval approach and answered wrongly by the other two approaches, then the question would be labeled as simple.
If a question is answered correctly by single-step and multi-step approach, then the question would be labeled as medium.
In general, the minimalist approach out of the three approaches, that produces the correct answer would be considered for labelling the complexity of the question.
Step 2 - In case if all the three approaches predict the answer wrongly, then the dataset from which the question comes decides the complexity of the question (inductive bias). In case if the question is from the single-hop QA, then the question is labeled as simple, whereas in case of coming from multi-hop QA dataset the question is labeled as complex.
A classifier model is then supervise trained using the question-complexity pair dataset generated using step1 and step2.
At test time, the trained classifier model is used to identify the complexity of the question.
Based on the complexity of the question, the relevant retrieval-augmented LLM approach is chosen (no-retrieval, single-step retrieval, multi-step retrieval)

Experimental Setup :

Benchmark datasets considered :
Single-hop QA dataset - NaturalQuestions, SQuAD 1.1, TriviaQA
Multi-hop QA dataset - MuSiQue, HotpotQA, 2WikiMultiHopQA
Classifier model - T5 large (770M)
LLMs used in RAG - FLAN-T5-XL (3B), FLAN-T5-XXL (11B), GPT-3.5 (Turbo)
The retrieval augmented techniques evaluated fall under 4 categories:
Simple - no retrieval and single step retrieval approaches (naive RAG)
Adaptive - Adaptive Retrieval (decides complexity of question based on the frequency of the entities in the user question), Self-RAG (LLM accesses if retrieval is required based on retrieval token prediction), Adaptive RAG (this paper)
Complex - multi-step approach (executing retrieval and response generation steps multiple times along with Chain of Thoughts reasoning)
Oracle - same as Adaptive RAG except using the perfect classifier that classifies all the questions correctly.
The approaches were evaluated for effectiveness and efficiency. Effectives was evaluated using EM, F1-score and accuracy, whereas efficiency was evaluated using time per query and steps

Observations :

Adaptive RAG technique average scores outperformed all techniques under simple and adaptive approaches across all 6 datasets. This behaviour was observed in all the 3 LLMs.
This clearly shows that understanding the complexity of a question is important in deciding the approach to be followed to answer the question

Average score of LLMs using Adaptive-RAG approach outperforms all other techniques under simple and adaptive approach

The average time per query was much reduced when using Adaptive-RAG technique compared to multi-step technique, in-spite of the EM, F1 and accuracy scores being little lesser than multi-step approach. This shows that Adaptive-RAG technique tends to a strike a balance between both effectiveness and being efficient.
Results also show that, when using Adaptive-RAG with Oracle (a perfect question complexity classifier), the approach outperforms all the other approaches both effectively and efficiently.
Hence a matured question complexity classifier will help improve the performance of the overall system by selecting the relevant retrieval augmented technique required to answer the question.

F1 and accuracy score of FLAN-T5-XL and FLAN-T5-XXL models when used as classifier for classifying question complexity

Results show that the classifier used in Adaptive-RAG was more accurate than other adaptive retrieval techniques. In turn, the performance of Adaptive-RAG was better compared to these methods. This signifies the importance of classifying the complexity of the question more accurately to boost the overall performance of the RAG system.

A confusion matrix was constructed to understand the performance of the classifier used in Adaptive-RAG. The classifier misclassified the questions that requires no retrieval as single step approach 47% times. Similarly the classifier misclassified the single step as multi-step and multi-step as single step 23% and 31% respectively.
Observations show that the results of the classifier did not vary much based on the size of the model. This shows that the classifier used need not be necessarily large enough thus helping in reducing the overall processing time per query during inference.

Accuracy and F1-score of classifier of different sizes

Conclusion :

Adaptive-RAG technique identifies the complexity of the question and dynamically selects the retrieval augmented technique like no-retrieval or single-step retrieval or multi-step retrieval based on the complexity of the question.
In a real-world scenario where not all questions would be complex or simple, a technique to classify the complexity of the question and then using the appropriate retrieval-augmented technique would help boost the performance of the solution and also improve the efficiency like inference time, latency and compute cost.
Using a more appropriately trained classifier that has better training strategy would help improve the performance of Adaptive-RAG even better.

E21 : Adaptive RAG

Written by Praveen Thenraj