Cross-encoders vs Bi-encoders : A deep-dive into text encoding methods

Rahul Bhatia
6 min readJun 28, 2024

--

Generated using DALL-E 2

In the world of Natural Language Understanding (NLU) and Information Retrieval (IR), handling textual data’s complexity is a significant challenge. Textual sentences may have subtle nuances that make it hard for machines to understand or categorize. This is where encoding methods like Cross Encoders and Bi Encoders come to play. Let’s break down these complex terms to the foundational principles to understand them thoroughly. These encoding techniques have been in the industry for a long time now and are the fundamental building blocks of the most advanced RAG(Retrieval Augmented Generation) architectures we see today. However, their applications go much beyond RAG and its very important to understand the differences impacting the choice of model used in different search and retrieval applications.

Cross Encoders

Definition

A cross encoder processes a pair of inputs together, considering the interaction between them during the encoding process. This means that for tasks such as sentence similarity, both sentences are inputted together, and the model learns to encode the relationship between them directly.

Mathematical Formulation

Given two sentences A and B:

  1. Concatenation:

Here, C represents the concatenated input of A and B.

2. Encoding:

The concatenated input C is passed through transformer model T:

3. Scoring:

A scoring function f (e.g., a fully connected layer) is applied to H:

Here, s represents the similarity score or relevance score between A and B.

Typical architecture of a Cross Encoder

Usage

Cross encoders are particularly effective in tasks where the interaction between the input pairs is crucial. They are commonly used in:

  • Question Answering: Matching questions with their corresponding answers.
  • Sentence Similarity: Determining the similarity between two sentences.
  • Ranking Tasks: Where precise interaction modeling is required.

Advantages and Disadvantages

Advantages:

  • High accuracy due to modeling the interaction between inputs directly.
  • Effective for tasks requiring fine-grained understanding of input pairs.

Disadvantages:

  • Computationally Intensive: Every pair of texts needs to be processed together, which means the computation scales with the number of pairs. This makes cross-encoders less suitable for scenarios requiring real-time or large-scale processing.
  • Infeasibility for Large Corpora: Comparing a single query against a large corpus is computationally prohibitive because each comparison requires a full forward pass through the transformer.
  • Not suitable for real-time applications due to higher inference time.

Bi-Encoders

Definition

A bi-encoder independently encodes each input before combining the resulting embeddings to compute a similarity score. This approach allows for more efficient computation, especially in scenarios where one of the inputs remains constant.

Mathematical Formulation

Given two sentences A and B:

  1. Encoding and Pooling(to generate Sentence embeddings):

Each sentence is encoded separately followed by pooling(to generate embedding from Sentence) using the same transformer model T:

2. Scoring:

A similarity function sim (e.g., cosine similarity) is applied to H_A​ and H_B​

Here, s represents the similarity score between A and B.

Typical architecture of a Bi-Encoder

Usage

Bi-encoders are particularly effective in scenarios requiring efficiency and scalability. They are commonly used in:

  • Information Retrieval: Matching queries with a large corpus of documents.
  • Recommendation Systems: Matching users with items.
  • Embedding-based Retrieval: Precomputing embeddings for efficient search.

Advantages and Disadvantages

Advantages:

  • Efficient for Large-Scale Retrieval: Texts can be pre-encoded and stored in a database. During retrieval, only the query needs to be encoded, and similarity can be computed efficiently using the precomputed embeddings.
  • Scalability: Suitable for large-scale retrieval tasks where the model needs to compare a single query against a vast corpus of documents.
  • Real-Time Applications: Faster in scenarios where real-time responses are necessary because the heavy computation is done upfront.

Disadvantages:

  • Lower Accuracy: Since the texts are encoded independently, bi-encoders may miss some of the fine-grained interactions between the text pairs, potentially leading to lower accuracy compared to cross-encoders.
  • Context Loss: The independent encoding might lose some contextual nuances that are critical for understanding the relationship between texts.

Usage in Retrieval-Augmented Generation (RAG) Pipelines

It would be unfair(at least in 2024 at the time of writing this article) to talk about Cross-encoders and Bi-encoders without mentioning their usage in RAG Pipelines.

In Retrieval-Augmented Generation (RAG) pipelines, the combination of bi-encoders and cross encoders is utilized to balance efficiency and accuracy. RAG pipelines enhance generative models by retrieving relevant information from a large corpus to aid in generating more accurate and contextually relevant responses over a custom/enterprise knowledge base.

Step 1: Initial Retrieval with Bi-Encoders

  1. Query Encoding:

The input query Q is encoded using a bi-encoder:

2. Document Encoding:

Each document D_i in the corpus is encoded independently:

3. Similarity Scoring:

The similarity between the query embedding H_Q​ and each document embedding H_D_I​​ is computed:

Based on these similarity scores, the top k relevant documents are retrieved.

Step 2: Re-Ranking with Cross Encoders

  1. Concatenation:

Each of the top k retrieved documents D_i​ is concatenated with the query Q:

2. Interaction Encoding:

Each concatenated pair C_i​ is passed through a cross encoder:

3. Relevance Scoring:

A scoring function f is applied to the hidden representations H_i​ to re-rank the documents based on their relevance to the query:

The documents are then re-ranked based on the scores r_i.

Comparative Analysis

When to Use Cross Encoders

  • High Precision Tasks: Tasks requiring a detailed understanding of the interaction between inputs, such as question answering and precise ranking.
  • Small to Medium-scale Data: Scenarios where computational resources are less of a constraint and high accuracy is paramount.

When to Use Bi-Encoders

  • Scalability: Tasks involving large-scale retrieval, such as search engines or recommendation systems, where efficiency is crucial.
  • Real-time Applications: Scenarios where fast inference is necessary, and the slight trade-off in accuracy is acceptable.

RAG Pipelines

  • Initial Retrieval: Bi-encoders are used for their efficiency in retrieving a large set of potentially relevant documents quickly.
  • Re-Ranking: Cross encoders are employed to refine the selection, leveraging their ability to model the interaction between the query and the retrieved documents to improve relevance and accuracy.

Conclusion

Both cross encoders and bi-encoders have their unique strengths and are suited for different types of tasks. Cross encoders excel in accuracy by modeling interactions between input pairs but at the cost of higher computational resources. On the other hand, bi-encoders provide a more efficient and scalable solution, making them ideal for large-scale and real-time applications. In RAG pipelines, combining both methods leverages the strengths of each, using bi-encoders for efficient initial retrieval and cross encoders for accurate re-ranking, thus achieving a balance between efficiency and precision.

--

--

Rahul Bhatia

Data Science @ Fidelity Investments | ex - CRED, Rakuten