Improving RAG (Retrieval Augmented Generation) Answer Quality with Re-ranker

Implementing the Re-ranker algorithm in the RAG pipeline

Shivam Solanki
Towards Generative AI
5 min readAug 4, 2023

--

Ranking model. Credit — https://en.wikipedia.org/wiki/Learning_to_rank

Artificial intelligence has significantly improved over the years, and AI models have become masterminds at answering questions based on a given context.

Human-like Q&A accuracy is advantageous when researching and extracting relevant information from a large set of documents corpus, like IBM’s Watson Discovery (Elastic Search, Solr, etc.), which allows us to cast a broad net to capture snippets of text pertinent to a specific query. In the previous blog, we learned about adding the information retrieval component to the RAG pipeline. However, to achieve a much higher level of accuracy, we need a way to rank these responses and fine-tune them even more accurately. Enter Re-ranker.

Architecture diagram — RAG

What is Re-ranker?

The drawn-out name for Re-ranker is “learning-to-rank approach,” and it’s precisely what it sounds like. The Re-ranker component sifts through the responses provided by a retriever (like the Watson Discovery, Elastic Search, etc.) and ranks them based on their relevance. ColBERT (Contextualized late interaction over BERT) based DrDecr model is trained with a deep learning technique using a large dataset for ranking passages in response to a given query.

ColBERT re-ranker (Source: www.semanticscholar.org)

The detailed implementation of the DrDecr model can be found in:

https://arxiv.org/pdf/2112.08185.pdf

Why is Re-ranker useful?

We need re-ranking because the first-stage retriever may be flawed. It may rank some irrelevant documents high, while some relevant documents might get lower scores. Thus, not all top-k documents are relevant, and not all relevant documents are in the top-k. Re-ranker refines these results and brings up the most relevant answers.

Implementing the Re-ranker

To experiment with the re-ranking in practice, let me guide you through an example using Watson Discovery as a retriever and a ColBERT (Contextualized late interaction over BERT) based DrDecr model as a re-ranker.

The first step is the setup, and for this, we will be importing the required libraries, including the ColBERTReranker, from the PrimeQA components.

from primeqa.components.reranker.colbert_reranker import ColBERTReranker

Next, we load the pre-trained re-ranker model.

reranker = ColBERTReranker(model="DrDecr.dnn")
reranker.load()

With the model now in place, we can use it to predict and re-rank documents relevant to a given query. Implementation of re-ranker will take place right after retrieval of data from Watson Discovery, i.e.,

  1. The user question/query is processed through Watson Discovery to get retrieving results.
def process_discovery_retriever(question):
global string_unicode

projects = discovery.list_projects().get_result()
for project in projects['projects']:
if (project['name'] == project_name):
project_id = project['project_id']

collections = discovery.list_collections(project_id = project_id).get_result()
for collection in collections['collections']:
# print(collection)
if (collection['name'] == collection_name):
collection_id = collection['collection_id']


query_result = discovery.query(project_id=project_id, query=question).get_result()
return query_result

2. It is then re-ranked by the pre-trained re-ranking model.

# Run ColBERT Reranker
from primeqa.components.reranker.colbert_reranker import ColBERTReranker
model_name_or_path = "DrDecr.dnn"
max_reranked_documents = 2
reranker = ColBERTReranker(model=model_name_or_path)
reranker.load()

reranked_results = reranker.predict(queries= [question], documents = [results], max_num_documents=max_reranked_documents)

print(reranked_results)

reranked_results_to_display = [result['document'] for result in reranked_results[0]]
df = pd.DataFrame.from_records(reranked_results_to_display, columns=['rank','document_id','title','text'])
print('======================================================================')
print(f'QUERY: {question}')
display( HTML(df.to_html()) )

The example notebook attached to this blog demonstrates these steps for a question like “What is Cloud Pak for Data?”

Analyzing the results

The notebook shows the results in two separate stages:

1. After retrieval but before re-ranking.

2. After re-ranking.

Before and After Results

You will notice that the relevance of the responses improves significantly after the re-ranking process. In the before column, you can observe that the document is talking about Palantir for Cloud Pak for Data (CP4D) and not CP4D. On the other hand, you can observe the sentence — “ IBM Cloud Pak for Data is a cloud-native solution that enables data scientists, data engineers and…………..” in the after re-ranker implementation. Therefore, this example demonstrates the improvement in the performance of the retrieval pipeline.

Impact of implementing Re-ranker

The accuracy of the RAG (Retrieval Augmented generation) pipeline is highly dependent on the result of the retriever/re-ranker. For example, if you provide the results from the Before column in the previous section to an LLM (Large Language Model) and ask “What is Cloud Pak for Data?”, it will provide you with a wrong answer and describe what is Palantir for Cloud Pak for Data instead. This type of retriever-reranker pipeline can be easily implemented for many purposes including but not limited to RAG.

Conclusion

As we push for AIs to provide more accurate and human-like answers, tools like Watson Discovery, Elastic Search, and re-ranking strategies are pivotal. Adding a re-ranker to our pipeline allows us to use better the sheer amount of information we can leverage in large documents.

While the retriever (Watson Discovery, Elastic Search, etc.) helps us to zone in on the appropriate context, the re-ranker sharpens our results and ensures that the most pertinent information is a priority.

Whether you’re building an intelligent Q&A system or striving to improve your existing AI’s ability to answer questions, it’s worth considering adding a re-ranking model to your stack. In this era of information overload, it’s about more than just having the information. It’s about having the correct information at the right time.

The full implementation details can be found in this notebook on the GitHub Repo.

The next stage in the RAG pipeline is to implement In-context learning to augment the knowledge and create a prompt for LLM. Learn more about it in our The next stage in the RAG pipeline is to implement a Re-ranker to get the most relevant document. Learn more about it in our The next stage in the RAG pipeline is to implement a Re-ranker to get the most relevant document. Learn more about it in our next blog.

Follow Towards Generative AI for more content related to the latest in AI advancement.

--

--

Shivam Solanki
Towards Generative AI

Sr. Data Scientist | Living at the interstice of business, data and technology