For better hybrid retrieval — introducing SRRF

AutoRAG
6 min readJul 30, 2024

--

In hybrid retrieval, there are two main methods: Reciprocal Rank Fusion (RRF) and Convex Combination (CC).
For more details on the CC method, refer to the previous blog post.

In this post, we will introduce the expected problems of the existing RRF and the SRRF, which attempts to solve these problems.
The content is based on the paper An Analysis of Fusion Functions for Hybrid Retrieval by Pinecone, and this post serves as a brief summary and review of the paper.

What is RRF?

First, we need to understand what RRF is. The main idea is to use the rank of each document when combining semantic and lexical scores, as the score distributions are different. Based on the scores obtained using different retrieval methods, we can determine the rank of each document. Let’s consider the following table as an example:

Which document is the best? If we only use the ranks, the document with the smallest sum of semantic and lexical ranks is the most relevant document.
Let’s calculate. Document A has a sum of 6, Document B has 7, and Document C has 5, making Document C the most relevant document.

This principle is formalized in the RRF formula:

Here, πSEM​(q,d) is the rank from semantic retrieval, πLEX​(q,d) is the rank from lexical retrieval, and η is a parameter.

The η parameter was set to 60 in the original RRF paper. However, this paper examines the effect of different η values on hybrid retrieval performance. For more details, refer to the original paper.

Problems with RRF

RRF does not use the relevance score itself, leading to a loss of information about the score distribution.

For example, consider the following table with actual scores:

Although Document A has a much higher semantic score than the other documents, this information is lost when only ranks are used. If we had used the scores to calculate the hybrid retrieval score, Document A would have been selected as the most relevant document. However, by converting scores to ranks, we lose information about the score distribution, and RRF would select Document C as the most relevant document.

How Much Does It Affect?

We want to know how much this information loss actually affects performance. To do this, we need a method with less information loss. Now, let’s introduce a method with less information loss than RRF.

First, we need to understand the sigmoid function.

Sigmoid Function

Those who have studied ML are likely familiar with the sigmoid function. The basic sigmoid function is given by:

basic sigmoid function formula

This function produces a graph that looks like this:

Basic sigmoid function

As the value of x decreases, the result of the sigmoid function rapidly converges to 0. Conversely, as the value of x increases, the result rapidly converges to 1. This characteristic makes the sigmoid function widely used as an activation function in binary classification in ML.

Now, consider a modified sigmoid function:

Think about how the sigmoid function changes with different values of β. As β increases, the sigmoid function becomes more sensitive to changes in x. This can be confirmed by plotting the function:

The difference of modified sigmoid function

In the chart above, the red line represents β is 1, the blue line represents β is 5, and the green line represents β is 50. As β increases, the slope in the middle becomes steeper.

Sigmoid and Indicator Functions

Don’t be intimidated by the term indicator function. It’s simpler than it sounds.

An indicator function is a function that returns 1 if a condition is met and 0 otherwise, similar to an if-else statement.

How are indicator functions related to the sigmoid function? Imagine β approaching infinity. The slope becomes so steep that the function behaves like an indicator function: it returns 0 if x is less than 0 and 1 if x is greater than 0. Thus, we can view the sigmoid function as an approximation of the indicator function!

Using this property, we can generalize the RRF function.

Expressing RRF with Sigmoid

How can we mathematically express the rank of document d like πo​(q,d)? This can be done using an indicator function:

Here, 1 represents the indicator function, and Rok​(q) is the set of k documents retrieved for query q. For each document dj​ in this set, the indicator function returns 1 if dj’s score is higher than di​’s score, and 0 otherwise. If no document has a higher score than di, the sum is 0, making the rank of di 1, indicating it is the top-ranked document.

Now, remember that we can replace the indicator function with the sigmoid function. Doing so, we get:

Why make this change?

We started this explanation to understand how much the information loss in RRF affects performance. To do this, we needed a method that could reflect the score distribution. By transforming RRF as shown above, we can now reflect the score distribution.

As β approaches infinity, it behaves similarly to the original RRF, not reflecting the score distribution. However, as β decreases, the actual retrieval scores are reflected in the final score.

This modified RRF is called SRRF and can be summarized as follows:

The original paper explains this more rigorously using Lipschitz continuity and Lipschitz constants. For a more precise explanation, refer to the original paper. The overall flow is not much different from this post.
Briefly, if a function is Lipschitz continuous (the sigmoid function is Lipschitz continuous), a smaller Lipschitz constant means less variation in the function’s output, preserving information. Here, β is the Lipschitz constant.

How Much Does It Affect?

The researchers compared the performance of SRRF and the original RRF by varying the β value. The results are as follows:

In the results above and below, the η parameter values are 60 and 5, respectively.

Overall, when the β value is too small, SRRF performs worse than RRF. This is because SRRF approximates the rank, and if the β value is too small, the approximation error increases, leading to lower performance.

Conversely, as the β value increases, SRRF tends to converge to a performance similar to RRF. This is because as β approaches infinity, SRRF produces the same values as RRF, resulting in similar performance.

Notably, when an appropriate β value is set, SRRF outperforms RRF. This indicates that completely eliminating the score distribution information in RRF negatively impacts performance to some extent. Additionally, using SRRF with an appropriate β value can mitigate this negative impact.

CC vs RRF vs SRRF

Finally, to compare the performance of the improved SRRF with another hybrid retrieval method, CC, we include the experimental results from the paper.

TM2C2 applies the tmm normalization technique of CC. For an explanation of TMM, refer to this blog post.

From the results above, we can observe that SRRF generally performs slightly better than RRF. Additionally, it is noteworthy that SRRF achieved the highest performance among the five retrieval settings on the HotpotQA dataset.

Thus, while Hybrid CC (TM2C2) often outperforms SRRF, there is potential for SRRF to be a better alternative in some cases.

Using SRRF in AutoRAG

The RAG automatic optimization tool, AutoRAG, is preparing to incorporate SRRF. You can check the progress in the GitHub issue!

--

--