Distribution-Based Score Fusion (DBSF), a new approach to Vector Search Ranking

Introduction of a new ranking algorithm optimized for vector similarity search

Published in

Plain Simple Software

4 min readNov 2, 2023

With the introduction of multimodal models, capable of processing a combination of multiple data types, the necessity of combining search results from different embeddings has become a necessity. The algorithms specialized to perform Hybrid Search are called fusion algorithms, and they are commonly employed by search engines.

Example of a Hybrid Search, combining both text and image search score

An example of hybrid search is the combination of both image search and text search results. Because each model is highly specialized in its own domain, we would ideally combine the scores obtained from two separate models (CLIP for image search, MiniLM for sentence similarity search).

Current hybrid search solutions

The most common hybrid search solutions, developed by Weaviate are relative score fusion and ranked fusion.

Reciprocal Ranked Fusion (RRF)

This fusion method combines results obtained from different searches by leveraging their ranking.

Figure 1: Visualization of the Reciprocal Ranked Fusion algorithm

By using this method, we ignore the search relevance score (ex. semantic similarity or BM25) only considering the ranking position. The higher the document appears in all searches, the higher the ranking computed by the fusion algorithm.

Relative Score Fusion

With this second fusion algorithm, we normalize the search scores obtained from different engines, ranking results accordingly.

Though more advanced than the previous algorithm, this algorithm only works well if the score distributions of the engine are similar (we will explain the reason for this consideration in the next section).

Raking challenges with current encoders

The most challenging practical issue we encounter when performing a similarity search using encoders is that all their similarity scores are concentrated around a mean value (which is not related to the relevance of the search), without sufficient variance for the tails to reach the score extremes [0, 1].

Figure 3: CLIP score distribution. Sources: https://www.researchgate.net/figure/Image-text-similarity-score-distributions-using-CLIP-ViT-B-32-left-and-ViT-L-14-right_fig4_370338853

For example, as we can see from the graph above, all the CLIP scores are concentrated between [0.10, 0.45], with a mean of 0.25: they never reach either 0 or 1.

Each encoder behaves differently, for example, OpenAI embedding API distribution scores are the following:

Figure 4: Image source: https://www.kolena.io/blog/how-to-validate-openai-gpt-model-performance-with-text-summarization

If we choose to standardize the search results obtained from multiple vector searches, the ones with a distribution mean closer to 1 will be given priority.

Figure 5: RelativeScore fusion applied to CLIP + BM25

The same is relevant when comparing the search results from models with different score constraints. BM25, for example, does not limit its score to 1, it can reach infinity; we can see the effect of this feature on the following relative score fusion application:

Figure 6: RelativeScore fusion applied to CLIP + BM25

Distribution-Based Score Fusion (DBSF)

After having made clear the issue of combining scores obtained by using different encoders, we propose the following solution: an algorithm that normalizes search scores by taking into account the distribution tails of each embedding.

Let us work on the following example: applying a fusion algorithm to the scores of CLIP (image search) and OpenAI (text search). Above (Figure 5), we have tried using the Relative Score Fusion algorithm to notice that OpenAI has an inherent ranking advantage given the mean of its distribution.

Instead, for each batch of search scores we will not perform a regular normalization, but a MinMax scaling using the tail extremes of their respective score distribution:

CLIP scores will be scaled by using [0.18, 0.30] as extremes
OpenAI scores will be scaled by using [0.40, 0.80] as extremes

Figure 6: Distribution-Based Score Fusion on CLIP and OpenAI search scores

Let us describe the algorithm using mathematical notations. Let E be a set of n embeddings, while x represents a search result obtained with one embedding. We can compute the new ranked scores using the following formula…

…where U is the merge of all the search scores obtained from the search and scaled using minmax scaler with our custom feature range (extremes indicated as 3 standard deviations from the mean), while the S(x) function is a simple sorting function that sorts the combination of all our final scores.

Conclusion

By using this technique, we are able to standardize the scores of multiple different embeddings, regardless of the mean of their score distribution, ultimately obtaining a ranking that is free from any normalization bias.