Extract useful information from your data or content with AI (RAG framework)

Published in

AlamedaDev

6 min readNov 24, 2023

What is the RAG framework

The RAG (Retrieval-Augmented Generation) framework is an advanced method used in AI to enhance the way it understands and responds to questions.

Imagine AI as a librarian who not only knows a lot, but also knows exactly where to find specific information in a vast library. In the RAG framework, when you ask a question, the AI first acts as a “retrieval,” scanning through vast amounts of data to find relevant information, much like a librarian searching for books that could contain the answer.

Once it gathers this information, it shifts to a “generator” mode, where it combines and processes the gathered data to provide a comprehensive and accurate response, akin to the librarian summarizing the key points from various books.

This two-step process allows the AI to provide more detailed, accurate, and contextually relevant answers, making it particularly useful in scenarios where precise and thorough responses are required.

Project description

This project aims to showcase the application of the Retrieval-Augmented Generation (RAG) framework to enhance the retrieval and presentation of information from the AstroJS framework documentation.

Our source of data for this example will be the Astro Framework documentation. (https://docs.astro.build/en/). With LangChain we are going to access AstroJS documentation, load it, preprocess and transform it into embeddings (OpenAI).
Subsequently, we will save the embeddings in a local store (ChromaDB).
We then will access the embeddings using LangChain and recover the texts based on the question the user has prompted.

Project setup

Requirements

chromadb
langchain
python-dotenv
bs4
tiktoken
streamlit
streamlit-chat

sources folder

# In this folder we will have all the content of the documentation we want to use
# I have stored here all th .hmlt files here
# since we fdont want to make a lot of requests to the astro server each time we run the program.
Getting Started 🚀 Astro Documentation.html
Install Astro with the Automatic CLI 🚀 Astro Documentation.html
…

Document load

Our first method will be for loading the sources documents.
This method will load and parse all the links in the file, so we can pass it to the next phase.

LangChain’s document loaders specialize in importing and transforming data from diverse sources and formats into a uniform structure. They are capable of processing data from a range of origins, including both structured and unstructured environments, such as online platforms, databases, and various proprietary and commercial services. These loaders are compatible with several file types, including documents, spreadsheets, and presentations. They convert these varied data inputs into a consistent document format, complete with content and relevant metadata.

LangChain offers an extensive collection of over 80 different document loaders we can choose based on our content, as seen in their Document Loaders page https://python.langchain.com/docs/integrations/document_loaders.

def load_sources(self):
    docs = []
    if not os.path.exists('./sources'):
        raise ValueError("The specified directory does not exist")

    # Construct the search pattern
    search_pattern = os.path.join('./sources', '*.html')

    # Use glob to find all files that match the pattern
    html_files = glob.glob(search_pattern)

    for source in html_files:
      docs.extend(BSHTMLLoader(source).load())
      self.sources = docs

Document splitting

Next, we need to split the document in chunks,

In database management, particularly with vector operations, processing and storing large documents can be resource-intensive.

Splitting these documents into smaller chunks enhances manageability and efficiency. This approach significantly reduces computational complexity and memory usage, making data handling more streamlined and effective.

Smaller vectors, derived from these segments, are easier and faster to process, optimizing both speed and resource utilization in the database system.

def split_documents(self):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=100,
        separators=["\n\n", "\n", "(?<=\. )", " ", ""]
    )
    self.splits = splitter.split_documents(self.sources)

Vector Stores and Embeddings

After splitting a document into smaller chunks, we need to index them for easy retrieval during question-answering. For this, LangChain utilizes embeddings and vector stores.

Embeddings convert text into numerical vectors, enabling comparison of semantically similar content. Post text splitting, these embeddings are stored in a vector store, a specialized database for finding similar vectors. When answering a question, embeddings of the question are compared with stored vectors to identify the most relevant document segments.

These segments are then analyzed by a language model to generate an answer. This efficient process hinges on the effective use of embeddings and vector storage for quick and relevant data retrieval.

For this, we will use ChromaDB and OpenAI embeddings.

def store_vectors(self):
    self.vector_store = Chroma.from_documents(
      documents=self.splits,
      embedding=OpenAIEmbeddings(),
      persist_directory=PATH_VECTOR_STORE
    )

Content Retrieval

In the retrieval phase of LangChain’s document handling process, the primary goal is to efficiently locate and extract the document chunks most relevant to a given query, typically a question.

The query embedding is compared against the embeddings of all document chunks stored in the vector store. The comparison aims to identify which chunks have the closest semantic relationship to the query.

Two key techniques used in this retrieval process are Maximum Marginal Relevance (MMR) and similarity scoring.

Similarity scoring is straightforward, it involves calculating a numerical score that represents how closely the embeddings of the query and a document chunk match, often using methods like cosine similarity. The higher the score, the more relevant the chunk is deemed to be.

On the other hand, MMR introduces an additional layer. It not only considers the similarity between the query and the document chunks but also evaluates the uniqueness of the information each chunk provides. This technique helps in balancing the relevance of the document regarding the query while also ensuring diversity among the retrieved chunks. By preventing the retrieval of chunks that are too similar to each other, MMR enhances the comprehensiveness and richness of the information retrieved, making it particularly useful in scenarios where a broad understanding of a topic is desired.

In our case, we will use the MMR technique.

def init_qa_retrieval(self):
    self.retrieval = RetrievalQA.from_chain_type(
        llm=OpenAI(),
        chain_type="map_reduce",
        retrieval=self.vector_store.as_retrieval(
          search_type="mmr"
        )
    )

The resulting UI will look like this:

Other Use Cases

Each of these use cases demonstrates the versatility of the RAG framework in processing and generating valuable insights across diverse domains:

1. Customer Support Automation: RAG can be instrumental in enhancing customer support systems. By retrieving and analyzing previous customer queries and resolutions, the framework can generate informed responses to new customer inquiries. This application improves response accuracy and speed, significantly enhancing the customer experience.

2. Product Recommendation Systems: In e-commerce, RAG can be utilized to refine product recommendation engines. By evaluating a user’s browsing and purchase history, the system can retrieve relevant product information and generate personalized product suggestions, thereby increasing the likelihood of customer satisfaction and sales.

3. Medical Diagnosis Assistance: In healthcare, RAG can assist in diagnosing diseases by analyzing patient symptoms and medical history. By retrieving and processing vast amounts of medical data and research, it can generate potential diagnoses, aiding healthcare professionals in making more informed decisions.

4. Legal Document Analysis: For legal professionals, RAG can be used to analyze and summarize lengthy legal documents. By retrieving relevant legal precedents and articles, it can generate concise summaries or highlight key legal points in complex documents, saving time and enhancing the efficiency of legal research.

Conclusion

In summary, with AI’s advanced capabilities, coupled with the robust structure of the RAG framework, businesses can achieve a new level of insight and efficiency. This combination is key in navigating the complexities of today’s data-driven landscape, enabling smarter decision-making and fostering innovation.

If this intersection of AI and the RAG framework sparks your interest, especially in how it can be applied to your data sets, don’t hesitate to get in touch with us. We’re excited to discuss how these cutting-edge tools can be custom-fitted to your business needs, transforming your data into a powerhouse of strategic value.

Reach out for a chat, and let’s explore the possibilities together.

AlamedaDev is a full life cycle software solutions company. We provide Modern Software Development and #AI integrations.

Founded in #barcelona

Website: www.alamedadev.com