Advanced RAG: Improving Retrieval-Augmented Generation with Hypothetical Document Embeddings (HyDE)

Published in

Pondhouse Data

7 min readMay 13, 2024

HyDE is a technique used to improve the performance of RAG models by generating hypothetical document embeddings based on the query and using them to retrieve relevant documents from the knowledge base.

Retrieval-Augmented Generation (RAG) is a powerful technique for enhancing language models’ ability to generate informative and contextually relevant responses. RAG combines the strengths of retrieval-based systems and generative language models, enabling them to access and utilize external knowledge effectively. However, the success of RAG heavily relies on the effectiveness of the document retrieval process.

Traditionally, document retrieval in RAG has been based on the similarity between the query and the document embeddings. While this approach has shown promise, it often falls short in capturing the nuanced information needs expressed in the queries. This is where Hypothetical Document Embeddings (HyDE) come into play.

HyDE is an innovative technique that aims to improve document retrieval in RAG by generating hypothetical document embeddings that represent the ideal documents to answer a given query. By leveraging these hypothetical embeddings, HyDE guides the retrieval process towards documents that are more likely to contain the relevant information, ultimately improving the performance of RAG models.

In this blog post, we will explore the concept of HyDE and explore how it enhances the document retrieval process in RAG. We will discuss and showcase the benefits of HyDE in bridging the gap between queries and relevant documents. Additionally, we will provide insights into the implementation and integration of HyDE in RAG models, providing a hands-on tutorial for how to use HyDE in your RAG system.

What are Hypothetical Document Embeddings (HyDE)?

With more traditional Retrieval-Augmented Generation (RAG), the retrieval process is based on the similarity between the users query and the source document chunks. For calculating the similarity, mostly embeddings are used — vector representations of the documents. These embeddings — in a good approximation — represent the semantic meaning of the documents as well as the search query. Long story short, traditional RAG uses the semantic meaning of documents and search query to find document chunks which are similar to the search query.

One of the problems with this approach is, that the semantic meaning of the search query is not always well represented in the document embeddings. This get’s quite obvious when you think about the following example: If you search for “What is the capital of France?” you would expect the retrieval system to return documents which contain semantically similar documents to “capital of France”. However, document chunks are most of the time longer in text length and contain more information than just the capital of France — eg. also containing information about the country France, the city Paris, population, etc. This means that the document embeddings are not perfectly aligned with the search query embeddings. Other — less relevant — document chunks might be returned instead (eg. documents about capitals of South America, if the document contains a lot of “capitals”).

While this problem is not so much of an issue for domains, where the embedding LLMs are trained on, it becomes prevalent in out-of-domain scenarios. For example when dealing with highly technical topics. (Mainly because the embedding model can’t encode the specifics of out-of-domain semantic meaning).

This is where HyDE comes into play.

HyDE generates hypothetical document embeddings based on the search query and uses them to retrieve relevant documents from the knowledge base. By creating these hypothetical embeddings, HyDE can guide the retrieval process towards documents that are more likely to contain the relevant information, improving the overall performance of the RAG system.

How does HyDE work?

The process of HyDE is quite simple:

Use a Large Language Model like GPT-3.5 to generate fake documents based on the search query. GPT-3.5 is prompted to “write a passage containing information about the search query”. Or — in more technical domains — GPT-3.5 could be prompted to “write a maintenance manual for the part mentioned in the search query”.
Use an embedding model to encode these fake documents into embeddings.
Use vector similarity search to find the most similar document chunks (in your knowledge base) to the hypothetical document embeddings. We don’t use the search query to find relevant documents, but the fake HyDE ones.
Use the retrieved document chunks to generate the final response.

**HyDE Process (source ‘Precise Zero-Shot Dense Retrieval without Relevance Labels’** )

NOTE: The fake documents are only used to find document chunks in your knowledge base. They are not sent to the answer LLM and not used to generate the final response.

Example in using HyDE with LlamaIndex

In this example, we will use the LlamaIndex library to implement HyDE.

LlamaIndex is an open-source Python library that provides a simple way to build RAG systems over your unstructured or semi-structured data for easy retrieval using GPT-style models. It allows you to construct a structured index over your data, enabling you to efficiently retrieve relevant information based on natural language queries.

Key features of LlamaIndex include:

Data Loaders: It provides built-in data loaders for various data sources like text files, CSV files, JSON files, and web pages.
Index Construction: LlamaIndex supports building different types of indexes such as vector stores, keyword tables, and knowledge graphs to organize and structure your data.
Query Interfaces: It offers query interfaces that allow you to retrieve relevant information from the indexed data using natural language queries.
Query Transformations: Query transformations allow you to modify or preprocess the natural language queries before they are used to retrieve information from the index.
Integration with Language Models: LlamaIndex seamlessly integrates with GPT-style language models like OpenAI’s GPT-3, allowing you to leverage their capabilities for generating responses based on the retrieved information.
Customization: The library provides flexibility to customize the index construction process, query interfaces, and integration with different language models based on your specific requirements.

In our case we are mostly interested in the “Query Transformations” and “Integration with Language Models” features of LlamaIndex.

Before getting started, download the example document from our website

Let’s first install the LlamaIndex library:

pip install llama-index

Next, import the required modules and set up logging.

import sys import os

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.indices.query.query_transform import HyDEQueryTransform
from llama_index.core.query_engine import TransformQueryEngine

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

For this example, let’s use a simple vector store index to store the document embeddings.

documents = SimpleDirectoryReader(
      input_files=["./HypotheticalSoftwareSDK.pdf"]
    ).load_data()
index = VectorStoreIndex.from_documents(documents)

Now we set up the HyDE query transformation and the query engine.

hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)

Note the include_original=True parameter in the HyDEQueryTransform. This parameter specifies whether to include the original query in the embedding strings for similarity search. Setting this to False will only use hypothetical embeddings for retrieval - otherwise, the hypothetical ones are combined with the original query.

And last but not least, we can use the query engine to either retrieve documents or generate responses.

query = "How to install the software development kit?"
response = hyde_query_engine.generate_response(query)
print(response)

Alternatively, we can retrieve the relevant documents based on the query.

query = "How to install the software development kit?"
query_bundle = hyde(query)

retrieved_documents = hyde_query_engine.retrieve(query_bundle)
print(retrieved_documents)

Last but not least, you can visualize the generated fake-documents to introspect the HyDE process.

query_bundle = hyde(query_str)
hyde_docs = query_bundle.embedding_strs
print(hyde_docs)

Customizing the prompt to create the HyDE documents

The default prompt used by LlamaIndex to generate the fake documents is

Please write a passage to answer the question\nTry to include as many key details as possible.\n\n\n{context_str}\n\n\nPassage:"""\n

While this is a good starting point, you often want more customization. For example when your domain is highly technical, you might want to provide more context to the language model, so it generates better, more relevant, fakes.

You might also want to be more specific about the type of document you want to generate. “Passage” might be too generic. Eg. “manual page” or “installation guide” might be more relevant.

To specify the custom prompt, you can pass the prompt parameter to the HyDEQueryTransform constructor.

prompt = """Please write a manual page for the software development kit\nTry to include as many key details as possible.\n\n\n{context_str}\n\n\nManual Page:"""\n
hyde = HyDEQueryTransform(include_original=True, hyde_prompt=prompt)

Conclusion

Hypothetical Document Embeddings (HyDE) is a promising technique that has the potential to improve the performance of Retrieval-Augmented Generation (RAG) models by generating hypothetical document embeddings based on the search query. These hypothetical embeddings can guide the retrieval process towards more relevant documents, which may lead to higher quality generated responses.

The blog post provides an example of how HyDE can be implemented using the LlamaIndex library, showcasing the integration of query transformation and language model features. By setting up the vector store index, configuring the HyDE query transformation, and utilizing the query engine, one can incorporate HyDE into a RAG system.

It is worth noting that the prompt used to generate the hypothetical document embeddings should be carefully designed and tailored to the specific domain or document type. This customization can contribute to the generation of more relevant and informative fake documents, potentially resulting in improved retrieval results.

HyDE presents an interesting approach to enhancing RAG models, particularly in out-of-domain scenarios where traditional retrieval methods may face challenges. By attempting to bridge the gap between queries and relevant documents, HyDE aims to enable RAG models to access and utilize external knowledge more effectively.

As natural language processing continues to evolve, techniques like HyDE may play a role in improving the performance of RAG models and expanding their applicability to a broader range of queries and domains. However, further research and evaluation are necessary to fully understand the impact and limitations of hypothetical document embeddings in real-world scenarios.

While the potential of HyDE is intriguing, it is important to approach it with a balanced perspective, considering both its strengths and the need for additional exploration and validation. As with any new technique, it is essential to conduct rigorous experiments, and analyze the results critically, to assess the true effectiveness and practicality of HyDE in various applications, such as question answering, dialogue systems, and information retrieval.

Advanced RAG: Improving Retrieval-Augmented Generation with Hypothetical Document Embeddings (HyDE)

What are Hypothetical Document Embeddings (HyDE)?

How does HyDE work?

Example in using HyDE with LlamaIndex

Customizing the prompt to create the HyDE documents

Conclusion

Further Reading

Improving Retrieval Augmented Generation: A Step-by-Step Evaluation of RAG Pipelines

RAG pipelines are one of the corner-stones of modern AI applications. Evaluating there performance is detrimental for…

Advanced RAG: Recursive Retrieval with llamaindex

With recursive retrieval, RAG can generate more coherent and contextually relevant responses. This guide introduces you…

Written by Sascha Gstir