Mastering LangChain RAG: Quick Start Guide to LangChain RAG (Part 1)

8 min readMay 14, 2024

All code examples mentioned can be found on my GitHub repository.

Welcome to my in-depth series on LangChain’s RAG (Retrieval-Augmented Generation) technology. Over the course of six articles, we’ll explore how you can leverage RAG to enhance your applications with state-of-the-art natural language processing techniques. Whether you’re a developer, a data scientist, or just an AI enthusiast, this series will equip you with the knowledge to implement and optimize RAG in your projects.

Overview of the Series:

1. Quick Start Guide to LangChain RAG: (This artile) Jump right in with our first tutorial where we’ll cover the basics of setting up LangChain RAG. This introductory article will help you get your environment ready and run your first RAG-based application.

2. Integrating Chat History: Learn how to incorporate chat history into your RAG model to maintain context and improve interaction quality in chat-like conversations. We will also learn how to save chat hostory to an SQL database.

3. Implementing Streaming Capabilities: Discover how to implement streaming with RAG to handle real-time data processing efficiently, perfect for applications requiring immediate responses.

4. Returning Sources with Results: This tutorial will teach you how to configure RAG to provide sources along with responses, adding transparency and credibility to the generated outputs.

5. Adding Citations to Your Results: Enhance your application’s trustworthiness by automatically including citations in your results, making them verifiable and more reliable.

6. Putting It All Together: In our final article, we’ll integrate all the components learned in previous tutorials to build a comprehensive RAG application, demonstrating the power and versatility of this technology.

Quick Start Guide to LangChain RAG

This article is based on a notbook publish by LangChain.

Introduction: Welcome to the first installment of my series on LangChain’s Retrieval-Augmented Generation (RAG). In this article, we’ll dive into the powerful capabilities of RAG technology and how it revolutionizes question-answering (Q&A) applications. Whether you’re looking to enhance your current AI systems or build sophisticated chatbots, understanding RAG is essential.

What is RAG? Retrieval-Augmented Generation, or RAG, is an innovative technique that supplements the fixed knowledge of LLMs with dynamic, external data sources. LLMs are trained on vast datasets and can answer questions across a wide range of topics. However, their knowledge is static, frozen at the time of their last training update. RAG addresses this limitation by integrating real-time data, thus keeping the model’s responses current and contextually relevant.

Why RAG Matters: In scenarios where you need your AI to consider private databases, recent events, or any information beyond its training cut-off, RAG becomes invaluable. It allows the model to fetch and utilize the exact data needed for generating accurate and relevant responses. This capability is crucial for businesses and applications requiring up-to-date information or dealing with proprietary data sets.

LangChain’s Contribution: LangChain offers a suite of tools designed to facilitate the development and deployment of RAG-powered applications. These tools simplify the integration of external data sources with LLMs, making it easier for developers to build more capable and responsive AI applications.

Getting Started with LangChain RAG

In this guide, we will develop a question-answering (QA) application based on the “LLM Powered Autonomous Agents” blog post by Lilian Weng. This application will enable us to query the content of the post effectively.

To achieve this, we will establish a straightforward indexing pipeline and RAG chain.

To begin using LangChain RAG, here are some initial steps:

Setup Environment: Ensure your development environment is prepared with the necessary dependencies.

pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-openai langchain-chroma bs4 python-dotenv

We need to set environment variable OPENAI_API_KEY for the embeddings model, which can be done directly or loaded from a .env file like so:

from dotenv import load_dotenv
load_dotenv()

You will have to create a file called “.env”. Here is a sample:

OPENAI_API_KEY = "your-key-here"

The next code snippet imports various modules and classes for a web-based application using LangChain. It includes tools for web scraping (bs4, WebBaseLoader), AI model integration (Chroma, OpenAIEmbeddings, ChatOpenAI), and utilities to manage document processing and parsing (StrOutputParser, RunnablePassthrough, RecursiveCharacterTextSplitter), facilitating the development of an intelligent, text-based application.

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI

Configure Data Retrieval: Connect your data sources to LangChain, configuring them for access during the retrieval process.

This code snippet demonstrates how to integrate LangChain with OpenAI’s GPT-3.5 Turbo for creating a question-answering application that operates on content from a specific blog post. Here’s a detailed breakdown:

Initialize the Language Model:

llm = ChatOpenAI(model="gpt-3.5-turbo")

This line initializes a connection to OpenAI’s GPT-3.5 Turbo model, setting it up to process and generate responses.

Load and Process Blog Content:

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

The WebBaseLoader is configured to scrape and load content specifically from the blog post by Lilian Weng. It targets only the relevant parts of the web page such as the post content, title, and header for processing.

Document Splitting and Embedding:

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

The content is split into smaller chunks using RecursiveCharacterTextSplitter, which helps in managing long texts by breaking them into manageable sizes with some overlap to preserve context. Chroma then creates a vector store from these chunks using embeddings provided by OpenAIEmbeddings, facilitating efficient retrieval.

Retrieval-Augmented Generation Chain Setup:

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

The vectorstore is transformed into a retriever capable of fetching the most relevant text snippets based on a query. A custom function format_docs is used to format these snippets suitably. The RAG chain integrates retrieval, prompt engineering (via hub.pull), and the initialized language model (llm) to process queries and generate responses, concluding with parsing the model's string output using StrOutputParser.

These steps outline the creation of a QA system that leverages advanced NLP techniques to understand and respond to queries regarding specific content, exemplifying the use of AI in enhancing information retrieval and interaction.

Running the Chain

rag_chain.invoke("What is Task Decomposition?")

The line rag_chain.invoke("What is Task Decomposition?") executes the Retrieval-Augmented Generation (RAG) chain to process the query "What is Task Decomposition?". Here's a detailed breakdown of what happens:

Query Input: The string “What is Task Decomposition?” is passed as an input to the rag_chain.
Context Retrieval: The retriever component of the chain, which is linked to the vector store, activates to find and fetch the most relevant text snippets from the indexed blog content. These snippets are those that best match the query based on their semantic similarity to the question.
Formatting Retrieved Content: The format_docs function then takes these retrieved documents and formats them into a single string with each document content separated by double newlines. This formatted string provides a coherent context that encapsulates all the relevant information needed to answer the query.
Generating the Answer: This formatted context string, along with the query, is fed into the ChatOpenAI model. The model uses the provided context and the specifics of the query to generate a response that is contextually relevant and accurate based on the retrieved information.
Output Parsing: Finally, the generated response from the ChatOpenAI model is parsed by the StrOutputParser, which converts the model's output into a clean, user-readable format.

By invoking this chain with the question “What is Task Decomposition?”, the system utilizes advanced NLP techniques to provide a detailed and informed answer, leveraging both the power of contextual retrieval and the generative capabilities of GPT-3.5-turbo. This demonstrates the effectiveness of integrating retrieval with generative AI to enhance the quality and relevance of responses in QA applications.

Testing the Retriever

retriever.invoke("What is Task Decomposition?")

When you run retriever.invoke("What is Task Decomposition?") in your code, you are executing a test on the retrieval component of your question-answering system. Here’s what this specific command does:

Query Processing: The command takes the query “What is Task Decomposition?” and passes it to the retriever component. The retriever is essentially a search function within your system, set up to find information within a pre-indexed dataset—here, the vector store created from your blog content.
Semantic Search: The retriever uses the embeddings (vector representations) of the text snippets stored in the vector store to perform a semantic search. It compares the vector representation of the query to the vectors of the stored snippets to identify the ones that are most semantically similar to the query.
Retrieving Relevant Text Snippets: Based on the similarity scores, the retriever selects and returns the text snippets from the blog that best match the query. These snippets contain information that is deemed most relevant to answering the question about task decomposition.

This invocation is particularly useful for testing the effectiveness and accuracy of the retrieval system. By running this command, you can assess how well the retriever is performing in terms of pulling the correct and contextually relevant information from your dataset without yet generating any new text or answers. This step ensures that the building blocks of your RAG system — the ability to fetch pertinent information — are working correctly before it is used to generate responses in the full RAG chain.

Cleanup

vectorstore.delete_collection()

The command vectorstore.delete_collection() is used within the context of managing a vector store, which in your application is a database-like structure storing vector representations of text data. Here's a breakdown of what this command does:

Collection Deletion: This command instructs the vectorstore to delete the entire collection of data it holds. A collection here refers to the set of all documents (text snippets) and their corresponding vector representations that have been indexed and stored in the vectorstore.
Impact on the Application: By executing vectorstore.delete_collection(), you effectively remove all the indexed data from the vector store. This action clears the database, which might be necessary during system resets, updates, or when you want to clear out old data to make way for new data to be processed and stored.

This function is critical for maintaining the cleanliness and relevance of the data within applications that rely on dynamic datasets or require periodic updates to their underlying information structure. It ensures that the system can be refreshed or repurposed without residual data from previous operations interfering with new operations.

Conclusion

In conclusion, this first installment of our series on LangChain’s RAG technology has provided you with a practical overview of integrating and implementing advanced question-answering capabilities using cutting-edge NLP techniques. As you explore the code and implement the features discussed, remember that all code examples mentioned can be found on our GitHub repository.

To further enhance your understanding and application of RAG in your projects, make sure to read the subsequent articles in this series.

Thank you for reading me!

Support my work by buying me a coffee or two!