Two approaches to generating optimized embeddings in the Retrieval-Augmented Generation (RAG) Pattern

3 min readApr 6, 2024

I assume you’re familiar with RAG; however, I’d like to offer a valuable link for exploring this concept further:
What is RAG (Retrieval Augmented Generation)

I aim to present two methodologies for creating optimized embeddings within the Retrieval-Augmented Generation (RAG).

Creating Embeddings Optimized for Accuracy:
If you’re optimizing for accuracy, a good practice is to first summarize the entire document, then store the summary text and the embedding together. For the rest of the document, you can simply create overlapping chunks and store the embedding and the chunk text together.

2. Creating Embeddings Optimized for Storage:
If you’re optimizing for space, you can chunk the data, summarize each chunk, concatenate all the summarizations, then create an embedding for the final summary.

Ultimately, you may utilize this library to enhance the optimization of your word embeddings :)
Source code: embedding-optimizer

$ pip install embedding-optimizer

import os

from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain.docstore.document import Document
from langchain_community.vectorstores import FAISS
from openai import OpenAI

from embedding_optimizer.optimizer import EmbeddingOptimizer

# Set your OpenAI API Key
os.environ['OPENAI_API_KEY'] = ''

# Load your document
raw_document = TextLoader('test_data.txt').load()

# If your document is long, you might want to split it into chunks
text_splitter = CharacterTextSplitter(separator=".", chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_document)

embedding_optimizer = EmbeddingOptimizer(openai_api_key='')

# documents_optimizer = embedding_optimizer.optimized_documents_for_storage(raw_document[0].page_content, documents)
documents_optimizer = embedding_optimizer.optimized_documents_for_accuracy(raw_document[0].page_content, documents)

# Embed the document chunks and the summary
embedding_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

db = FAISS.from_documents(documents_optimizer, embedding_model)

# query it
query = "What motivated Alex to create the Function of Everything (FoE)?"
docs = db.similarity_search(query)

print(docs[0].page_content)

Additionally, there are two functions available for summarizing extensive texts via OpenAI:

First method: summarize each part independently

The first solution would be to split the text into multiple chunks. Then for each chunk, we would ask the API to summarize this part of the text. Then we would join together all the sub-summaries.

from embedding_optimizer.optimizer import EmbeddingOptimizer

summary_optimizer = EmbeddingOptimizer(openai_api_key='')
summary = summary_optimizer.summarize_each_part_independently("What motivated Alex to create the Function of Everything (FoE)?", chunk_size=100)

Second method: summarize the text incrementally

For this second solution, our main goal is to solve the problems encountered with our first solution. We want to have a more coherent and structured summary.

Our solution is to build our summary progressively. Instead of creating multiple sub-summaries and then combining them into one big summary, for each prompt, we are going to provide a chunk of text to summarize and the last 500 tokens of our summary. Then we will ask OpenAI to summarize the chunk of text and add it organically to the current summary.

from embedding_optimizer.optimizer import EmbeddingOptimizer

summary_optimizer = EmbeddingOptimizer(openai_api_key='')
summary = summary_optimizer.summarize_text_incrementally("What motivated Alex to create the Function of Everything (FoE)?", chunk_size=100)

When summarizing long texts using OpenAI, employing an incremental approach (Method 2) provides better results, ensuring coherence and preserving key ideas.

References:

Secrets to Optimizing RAG LLM Apps for Better Accuracy, Performance and Lower Cost!

A comprehensive guide on how to improve performance, accuracy and reduce costs for LLM apps built using Retrieval…

medium.com

https://medium.com/@tanguyvans/how-to-summarize-long-texts-using-openai-improving-coherence-and-structure-d896c5510c45