The aRt of RAG Part 3: Reranking with Cross Encoders

5 min readFeb 8, 2024

Introduction

In Retrieval-Augmented Generation (RAG), a reranker plays a crucial role in refining the results obtained from the initial retrieval process. After the retriever generates a set of potential candidates from a large document collection, the reranker evaluates these candidates more thoroughly. Its main function is to re-rank the candidates based on relevance, coherence, or other criteria to ensure that the most suitable options are presented for the generation stage. By doing so, the reranker helps improve the quality of the final output produced by the RAG model, enhancing its effectiveness in generating coherent and relevant responses to queries.

Reranking

Several reranking techniques are commonly used in information retrieval and natural language processing tasks. Some of the most common ones include:

Learning to Rank (LTR): This approach involves training machine learning models to predict the relevance of documents or candidates based on various features extracted from the data. Models such as gradient boosting machines (GBM), support vector machines (SVM), or neural networks are often employed in LTR.
Relevance Feedback: This technique involves using feedback from users or other sources to iteratively refine the ranking of documents. It can be done through explicit feedback (e.g., user ratings) or implicit feedback (e.g., user behaviour).
Semantic Similarity: Assessing the semantic similarity between queries and documents can be used to rerank candidates. Techniques such as word embeddings or pre-trained language models like BERT are often utilised for this purpose.
Diversification: Diversification techniques aim to present a diverse set of relevant documents to cater to different aspects of the query. This can be achieved through algorithms like Maximal Marginal Relevance (MMR) or clustering-based approaches.
Query Expansion: This technique involves expanding the original query with additional terms to retrieve more relevant documents. Reranking can then be performed based on the expanded query.
Contextual Reranking: Taking into account contextual information, such as user context or dialogue history, can help improve the relevance of reranked documents in conversational search or recommendation systems.
Hybrid Approaches: Combining multiple reranking techniques or integrating reranking with other stages of the retrieval process can often lead to improved performance.

These techniques can be used individually or in combination depending on the specific requirements and characteristics of the task at hand. A re-ranker can substantially improve the final results for the user.

Continuing with the theme from Part 1 & Part 2 of utilising embeddings here we look at techniques that fit under “ Semantic similarity”. A common semantic re-ranker is the Cross-Encoder. The query and a possible document is passed simultaneously to transformer network, which then outputs a single score between 0 and 1 indicating how relevant the document is for the given query.

The advantage of Cross-Encoders is the higher performance, as they perform attention across the query and the document.

Cross encoders

In a cross-encoder architecture the input of the model always consists of a data pair(e.g., two sentences or documents) which are processed jointly by the encoder. This allows the model to capture interactions and relationships between the input sequences more effectively. The encoder typically consists of multiple layers of neural network units, such as transformers or recurrent neural networks (RNNs), which encode the information from the input sequences into fixed-size representations.

Cross-encoders are commonly used in tasks such as:

Sentence Pair Classification: Determining the relationship between two sentences, such as entailment, contradiction, or neutral.
Semantic Similarity Scoring: Computing the similarity score between two sentences or documents.
Question Answering: Finding the answer to a question given a passage or context, where both the question and the passage are encoded together.
Information Retrieval: Ranking documents based on their relevance to a given query, where both the query and the documents are encoded jointly.

By considering the interaction between input sequences, cross-encoders can capture more nuanced semantic relationships and dependencies, leading to improved performance in various NLP tasks. These models are often pre-trained on large corpora of text data using techniques like supervised learning or self-supervised learning before being fine-tuned on specific downstream tasks.

In terms of search, you need to use the Cross-Encoder with each data item and the search query, to calculate the similarity between the query and data object.

Cross encoders are not a distinct reranking technique on their own, they can be employed as a component within a reranking system to enhance the semantic understanding of the relationship between queries and documents, ultimately contributing to more accurate reranking.

Reranking Mongo searches

In Part2 we looked at hybrid search, combining keyword with vector search. With hybrid search we combine and rerank using MRR to give us a ranked list. Now we are going to rerank whats retrieved from hybrid search. We add the following function:

    
def reranker(query, hits):
    from sentence_transformers import CrossEncoder
    
    # To refine the results, we use a CrossEncoder. A CrossEncoder gets both inputs (input_question, retrieved_question)
    # and outputs a score 0...1 indicating the similarity.
    cross_encoder_model = CrossEncoder("cross-encoder/stsb-roberta-base")

    # Now, do the re-ranking with the cross-encoder
    sentence_pairs = [[query, hit["text"]] for hit in hits]
    similarity_scores = cross_encoder_model.predict(sentence_pairs)
    
    for idx in range(len(hits)):
        hits[idx]["cross-encoder_score"] = similarity_scores[idx]

    # Sort list by CrossEncoder scores
    hits = sorted(hits, key=lambda x: x["cross-encoder_score"], reverse=True)
    print("Top 5 hits with CrossEncoder:")
    for hit in hits:
        print("\t{:.3f}\t{}".format(hit["cross-encoder_score"], hit["_id"]))

    print("\n\n========\n")

Now to use the function we perform any of vector, keyword or hybrid searches and feed the results to the reranker:

query = "Our Monte Carlo model predicts that protons are easily accelerated beyond the knee \
        in the cosmic ray gy density as the plasma expands downstream from the spectrum; the high magnetic fields"
top_k = 10


result = atlas_hybrid_search(query, top_k, db_name, collection_name, vector_index_name, keyword_index_name)
reranked = reranker(query, result)

You can experiment with different reranker models to find the one that best suits your needs. The question is what is “best”. Next we will look at evaluation. Until then happy reranking

The aRt of RAG Part 3: Reranking with Cross Encoders

Introduction

Reranking

Cross encoders

Reranking Mongo searches

References

Written by Ross Ashman (PhD)