In this blog post, we will explore why tuning hyper parameters is crucial in hybrid retrieval. This content is based on Pinecone’s paper “An Analysis of Fusion Functions for Hybrid Retrieval”. Since the paper contains a detailed analysis of hybrid retrieval, you can gain much more information by reading the original document.
What is Hybrid Retrieval?
What is hybrid retrieval? Simply put, it combines the results of multiple retrieval modules. There are techniques that use vector databases and cosine similarity (semantic retrieval), and methods like BM25 that analyze the frequency of keyword occurrences using the TF-IDF technique (lexical retrieval).
Hybrid retrieval combines (fuses) the results of semantic and lexical retrieval to calculate the final retrieval score. But how is this calculated?
Hybrid cc
There are several hybrid retrieval methods such as rrf. In this post, I will tell about hybrid cc retrieval.
In the hybrid cc retrieval, cc stands for convex combination. You can think of it as a “weighted sum.” This means that the results of semantic and lexical retrieval are added together, each with a specific weight. If you want to give more importance to the semantic retrieval results, you can assign a higher weight to it.
The formula can be expressed as follows:
Here, α is the weight parameter. This allows you to decide which of the semantic or lexical retrievals is more important. Since the formula is a very simple weighted sum, it is easy to understand.
Hybrid cc Normalization
One of the issues with hybrid cc is that the score distributions of lexical and semantic retrieval are different.
For example, BM25 scores range from 0 to 1, while cosine similarity scores range from -1 to 1. To address this, a normalization process is needed to make the scores comparable. This prevents poor results due to mere differences in score distributions.
Normalization should be performed for both lexical and semantic scores.
There are several normalization techniques for hybrid cc, including:
- mm: Min-max Normalization. The highest and lowest scores are set as the maximum and minimum values, respectively, and normalization is performed using these values.
- tmm: Similar to mm, but the minimum value is set to the theoretically lowest value. For BM25, it would be 0, and for cosine similarity, it would be -1.
- z: Z-score Normalization. Normalization is performed using the mean and standard deviation.
How Important are the Weights in Hybrid cc?
The Pinecone research team measured how much performance is affected by changing weights using various data and normalization methods. They measured NDCG@1000 performance using different weights and normalization methods.
(If you don’t know what NDCG is, check out this related blog post.
The research team proved the following through experimental and theoretical methods. The theoretical proof is complex and lengthy, so please refer to the original paper.
1. Changing the normalization method in hybrid cc does not affect the ranking of passages in the hybrid cc results.
2. The maximum performance remains similar even when changing the normalization method. However, different weight values are required.
3. Among the normalization methods, the tmm method is the most robust in terms of data distribution. It was observed that the peak scores are sharper and more distinguishable compared to other normalization methods.
Additionally, the experimental results show the following:
1. Adjusting the weights in hybrid cc can result in a difference of more than 0.1 in NDCG values.
2. The weight value that shows the maximum performance varies depending on the normalization method.
3. The weight value that shows the maximum performance differs for each dataset (e.g., see the HotpotQA results).
Ultimately, it is evident that the performance variation is significant depending on the weights in hybrid cc, regardless of the normalization technique. Moreover, since this varies for each dataset, it is crucial to find the appropriate weight values through exploration when using hybrid cc.
Conclusion
Selecting the appropriate weights is very important in hybrid retrieval. While any normalization method can be used to achieve high performance, the appropriate weights change continuously and significantly impact performance. Although we couldn’t cover the hybrid rrf method in this post, we hope to address it in the future.
So, How Do We Optimize?
You might be wondering, “So how do we find the optimal values?” From version 0.2.10, AutoRAG introduced a new optimization strategy for hybrid retrieval.
Previously, you could only experiment with a few weights at a time by inputting specific weights. To improve this, you can now set the weight search range and specify how many weights to experiment with to find the optimal point.
By simply configuring the YAML file as shown below, you can experiment with 101 weights by changing the weight from 0.0 to 1.0 in increments of 0.01 to find the optimal weight. All you need to do is modify the YAML file! (Updating the AutoRAG version is, of course, essential)
modules:
- module_type: hybrid_cc
normalize_method: [ mm, tmm, z, dbsf ]
weight_range: (0.0, 1.0)
test_weight_size: 101
You can even specify four normalization methods as shown above. Although you can experiment with all four methods, we recommend using the ‘tmm’ method as it is the most robust, as concluded above. The performance difference between normalization methods is not significant.
For more detailed instructions on using AutoRAG, refer to the documentation or the AutoRAG blog.