Similarity Learning vs Search Reranking: Practical Approaches to boosting real-world search performance

5 min readMay 10, 2023

As an engineer exploring advanced Natural Language Processing (NLP) techniques, 👩‍🔬👨‍🔬 you understand the importance of efficient search systems 🔍. The ability to retrieve precise and relevant information from vast amounts of data is essential for enhancing the performance of various applications and systems 📈. Nevertheless, accomplishing this in a complex and ever-changing environment 🌀 can be challenging.

In this article, we delve deep into two distinct yet highly effective strategies designed to enhance the performance of search systems. The first strategy is similarity learning — a custom modeling approach that fine-tunes embeddings to improve the quality of search results. This approach is particularly useful for engineers looking to customize language models to better fit specific domains, enhancing search results' relevance and accuracy.

The second strategy, reranking, utilizes pre-trained models to refine the order of search results, a technique exemplified with Cohere’s Rerank. This is a powerful tool for engineers aiming to enhance their existing search systems with minimal changes to the existing infrastructure.

Improve search with similarity learning

This section shares insights on fine-tuning sentence embeddings for better similarity search and how it can streamline the labelling workflow. All the code and data referenced in this post can be found on GitHub.

Large language models (LLMs), while excellent at handling a broad range of tasks, may be someone other than experts in your specific domain. This is where fine-tuning becomes useful. Fine-tuning is the process of adjusting your language model to better align with the domain of your data.

The tools

To fine-tune embeddings, we adopt similarity learning, a technique incorporating class information in our scenario. We’ll utilise an open-source framework for similarity learning, available on GitHub.

Quaterion is a library that can use different types of similarity information to fine-tune embeddings. In our context, we use SimilarityGroupSamples, as class information is our only similarity metric. The model comprises a pre-trained LLM serving as the encoder, and a SkipConnectionHead on top.

We’re using the AG News classification dataset for this experiment. This dataset features four classes: World, Sports, Business, and Sci/Tech. We commence with 20,000 records, manually label 261, and apply weak supervision to obtain 10,854 usable records for our fine-tuning pipeline. The remaining 9,156 records act as a test set for evaluation.

from quaterion import SimilarityGroupSamples, SkipConnectionHead

similarity_group_samples = SimilarityGroupSamples(data)
skip_connection_head = SkipConnectionHead(base_model, num_features=384)python

The loss function we use is a triplet loss with cosine distance as the distance metric.

from quaterion.losses import TripletLoss

triplet_loss = TripletLoss(distance_metric="cosine")

The training is handled by Quaterion, using PyTorch Lightning under the hood. We specify the data loaders for training and validation and call the fit method.

from quaterion import QuaterionModel

quaterion_model = QuaterionModel(skip_connection_head, triplet_loss)
quaterion_model.fit(train_dataloader, val_dataloader)

To evaluate our fine-tuning, we use the “top_1k” metric, which measures the percentage of records with the same class within the 1000 most similar records. We also test the top_k metric for different values of k.

def evaluate_top_1k(raw_embeddings, fine_tuned_embeddings, test_data):
    # Calculate cosine similarity for raw and fine-tuned embeddings
    raw_similarity = cosine_similarity(test_data, raw_embeddings)
    fine_tuned_similarity = cosine_similarity(test_data, fine_tuned_embeddings)

    # Identify the top 1k most similar records for raw and fine-tuned embeddings
    raw_top_1k_indices = np.argsort(raw_similarity, axis=1)[:, -1000:]
    fine_tuned_top_1k_indices = np.argsort(fine_tuned_similarity, axis=1)[:, -1000:]

    # Calculate the top_1k metric for raw and fine-tuned embeddings
    raw_top_1k_metric = np.mean([test_data[i] in raw_top_1k_indices[i] for i in range(len(test_data))])
    fine_tuned_top_1k_metric = np.mean([test_data[i] in fine_tuned_top_1k_indices[i] for i in range(len(test_data))])
    
    return raw_top_1k_metric, fine_tuned_top_1k_metric

raw_top_1k, fine_tuned_top_1k = evaluate_top_1k(raw_embeddings, fine_tuned_embeddings, test_data)

Improve search with pre-trained models and reranking

Traditional keyword-based search systems can often yield frustrating results, which may not be relevant to the user’s query. The Cohere Rerank approach aims to bridge this gap by using semantic-based search techniques to yield more relevant and accurate results.

Cohere Rerank is designed to act as the final stage of a search flow, providing a ranking of relevant documents per a user’s query. This allows companies to retain an existing keyword-based or semantic search system for the initial retrieval and integrate the Rerank endpoint for the final re-ranking.

The Rerank endpoint uses a large language model to compute a relevance score for the query with each of the initial search results. It delivers higher quality results, especially for complex and domain-specific queries, with just a single line of code change in your application.

Cohere Rerank is an easy and low-complexity method of enhancing search results, allowing users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure. It can boost search quality for over 100 languages and can be easily incorporated into your search stack.

Furthermore, Cohere Rerank also supports reranking of up to 1,000 documents, making it suitable for smaller knowledge bases. It provides a practical solution to augment existing search systems rather than replacing them completely.

Cohere provides a solution to improve the relevance of search results. It uses a two-stage approach: initial retrieval and re-ranking. The re-ranking stage is where Cohere’s Rerank endpoint comes into play.

Keyword Retrieval

The initial retrieval is done using a traditional search engine like Elasticsearch, OpenSearch, or Solr. Here’s an example of a traditional Elasticsearch search:

from elasticsearch import Elasticsearch
es = Elasticsearch("http://localhost:9200")
query = "Cats lifespan"
resp = es.search(index="index", size=100, query={'query_string': {'query': query}})
docs = [hit['_source']['text'] for hit in resp['hits']['hits']]
print(docs)

Reranking with Cohere

Once the initial search results are obtained, they are reranked using Cohere’s Rerank endpoint.

import cohere
co = cohere.Client("{apiKey}")
rerank_hits = co.rerank(query=query, documents=docs, top_n=3, model='rerank-multilingual-v2.0')
print(rerank_hits)

The Rerank endpoint computes a relevance score for the query and each document and returns a sorted list from the most to the least relevant document. This method allows users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure.

Conclusion

Both similarity learning and reranking approaches present unique and innovative methods to improve search results, each with its own advantages and potential drawbacks. 🤔

The similarity learning approach, as exemplified by the Quaterion method, focuses on fine-tuning embeddings to adapt language models to specific domains. This can significantly enhance search performance, particularly when class information is accessible. However, it necessitates a more hands-on procedure and proficiency in machine learning. Fine-tuning and validation may also be resource-intensive, particularly for large datasets. 💻🔍

On the other hand, the reranking approach, as demonstrated by Cohere Rerank, takes advantage of the power of semantic search technology to refine and enhance search results. This technique provides a straightforward and low-complexity approach to improve search results, allowing users to incorporate semantic relevance into their existing search systems without significant infrastructure modifications. It is a safe and fair method to enhance search results. 🔍🚀