Rapidly Build a Powerful RAG Application with Django, LangChain, React, and ElasticSearch
Hello, Hackers! In this article, I will share my approach to quickly building a simple yet powerful RAG application.
RAG (Retrieval Augmented Generation) is an architectural approach designed to enhance the capability of LLMs by enabling them to answer queries based on provided documents. In simple terms, it focuses on retrieving relevant documents for a query and providing them as context to the LLM.
The application of RAG can help solve many real-world problems. For instance, I’ve used RAG to assist a friend with an organizational issue, specifically utilizing it to search for specific rules or policies within a collection of PDFs, each containing hundreds of pages that could take hours to look for specific rule or policy inside the documents.
High Overview
This is a simple RAG application architecture. Let’s break the architecture down into three main processes:
- Document Embedding and Indexing in a Vector Database: In this step, we simply convert the content of the documents into embeddings using an Embedding Model and store the document embeddings in a Vector Database.
- Document Retrieval from the Vector Database: We convert the user’s query into an embedding and use it to retrieve relevant documents from the Vector Database.
- Querying the LLM with Retrieved Documents as Context: The query is sent to the LLM with the retrieved relevant documents as context, enabling the LLM to generate an accurate response.
Let’s Build!
In this article, I want to show you how to quickly build a simple RAG application in practice! I will provide a step-by-step guide and outline the tech stack used to build each component of your RAG application.
We will go over each component in the system design one by one, and I will share the implementation details of each component.
ERD
This simple ERD will be sufficient to build our basic RAG application:
- Session: Manages chat sessions, with a ‘title’ field that contains the LLM-generated title based on the user’s first message.
- Chat: Manages chats within each session. This model includes two important fields: ‘role’ (to indicate whether the message is from the user or the AI) and ‘content’ (which contains the chat content in markdown format).
- Document: Manages all documents that will be retrieved and used as context for the LLM model.
- Page: Documents will be split into pages. Instead of embedding the entire document — which isn’t feasible for large documents — we chunk the document into pages. This way, the AI refers to pages from multiple documents as context, rather than entire documents.
React
On the frontend side, I chose to use React for its simplicity and efficiency. The main reason is that I found a very useful library by Ant Design, which can be quickly utilized with minimal setup, configuration, and customization.
@ant-design/pro-chat
This library provides nearly all the features needed to cover the functionalities required for an LLM chat interface. There’s no need to build everything from scratch — check it out here. Additionally, you can use the Ant Design UI library to swiftly build extra UI components for your RAG application with ease.
Django
Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design. — Django
With Django, you can quickly build an admin panel to manage your application. Django is also backed by a strong Python community that offers many libraries, which are invaluable for quickly building a RAG application.
Django Admin
Using Django’s admin feature, you don’t need to build authentication, manage permissions, or create a UI to manage your application’s data (sessions, chats, documents).
Django REST Framework
Django REST Framework allows you to easily implement authentication for your APIs and create other endpoints that will be consumed by the React client to interact with the LLM on the server side.
Django Q
Django Q is a native Django task queue, scheduler and worker application using Python multiprocessing.
Document embedding and storing embeddings in a vector database can take a significant amount of time. By using this library, you can implement the embedding and indexing process as a task queue, which will be processed separately from your main Django application. Time-consuming tasks will be handled by workers.
Alternatively, you can use a scheduler to periodically execute tasks to fetch documents or perform other actions. Based on my experience, the UI interface provided by Django Q is very convenient, allowing you to monitor and manage task execution without altering your code. You can even seamlessly retry failed tasks directly from the Django Admin interface.
LangChain
LangChain is a framework for developing applications powered by large language models (LLMs).
By using LangChain, we can seamlessly build our LLM application from development to deployment. It offers many useful built-in libraries that can accelerate your development process, allowing you to avoid building everything from scratch.
ElasticSearch
There are many options for vector database technologies available, but personally, I prefer Elasticsearch because I am already familiar with it. In my opinion, it also makes additional filtering of documents easier.
Elastron
With the help of a tool called Elastron, I can easily manage and debug each index.
Implementation
I will not cover the entire implementation; instead, I will focus only on the key parts of the application.
Document Indexing Triggers
We can add document indexing touchpoints or triggers in various parts of the Django Admin interface:
When the user save document.
from django.db import models
from django.db.models.signals import post_save
from django.dispatch import receiver
class Document(models.Model):
title = models.CharField(max_length=255, blank=False)
description = models.CharField(max_length=255, blank=True, null=True)
pdf_file = models.FileField(upload_to='documents/', blank=False)
def __str__(self):
return self.title
@receiver(post_save, sender=TransactionDetail, dispatch_uid="index_document_to_vector_db")
def index_document_to_vector_db(sender, instance, **kwargs):
# dispatch indexing task to Django QUser can trigger manually using Django Admin bulk actions.
from django.contrib import admin
from .models import Document
@admin.action(description="Index Document to Vector DB")
def index_document_to_vector_db(
modeladmin,
request,
queryset
):
# dispatch indexing task to Django Q
@admin.register(Document)
class DocumentAdmin(admin.ModelAdmin):
list_display = (
'title',
'description',
)
actions = [index_document_to_vector_db]Document Indexing Process
Before indexing the document, I recommend converting its content into markdown. There is a tool specifically designed to convert PDF documents into markdown for LLM: PyMuPDF4LLM.
from langchain_elasticsearch import ElasticsearchStore
from langchain_core.documents import Document
from .models import Document as DocumentModel
from langchain_ollama import OllamaEmbeddings
from pymupdf4llm import LlamaMarkdownReader
ollama_embeddings = OllamaEmbeddings(
model="llama3.1",
)
elastic_vector_search = ElasticsearchStore(
es_url="http://localhost:9200",
index_name="<your index name>",
embedding=ollama_embeddings,
es_user="elastic",
es_password="changeme",
)
def sync_documents_to_vector_db(documents: list[DocumentModel]):
for document in documents:
md_read = LlamaMarkdownReader()
chunks = md_read.load_data(f"{settings.BASE_DIR}/media/{str(document.pdf_file)}")
for i in range(len(chunks)):
chunk = chunks[i]
existing_page = Page.objects.filter(
document__id__exact=document.pk,
page_number=chunk.metadata['page']
).first()
if existing_page is None:
Page.objects.create(
document=document,
page_number=chunk.metadata['page'],
content=chunk.text
)
elastic_vector_search.add_documents(documents=[
Document(
page_content=summarization_and_keywords.content,
metadata={
**chunk.metadata,
"document_id": document.pk,
"title": document.title,
"description": document.description,
},
)
], ids=[
f"{document.pk}-{chunk.metadata['page']}-{i}"
])You can choose any preferred embedding model you like; in this example, I’m using the Ollama embedding.
Document Retrieval from Vector Database
from langchain_elasticsearch import ElasticsearchStore
from langchain_core.documents import Document
ollama_embeddings = OllamaEmbeddings(
model="llama3.1",
)
elastic_vector_search = ElasticsearchStore(
es_url="http://localhost:9200",
index_name="<your index name>",
embedding=ollama_embeddings,
es_user="elastic",
es_password="changeme",
)
def get_relevant_documents(query: str)
docs: list[Document] = elastic_vector_search.similarity_search(
query=query,
k=50
)
return docsQuery to LLM With Retrieved Documents as Context
from langchain_core.prompts import ChatPromptTemplate
def format_docs(docs) -> str:
return "\n".join([f"""=========[1m Document Page Separator [0m=========
--- Document Page Metadata ---
Document ID: {doc.metadata["document_id"]}
Document Title: {doc.metadata["title"]}
description: {doc.metadata["description"]}
Page: {doc.metadata["page"]}
Document Total Pages: {doc.metadata["total_pages"]}
--- End of Document Page Metadata ---
--- Document Content ---
{doc.page_content}
--- End of Document Content ---
=========[1m End of Document Page Separator [0m=========
""" for doc in docs])
chat_template = ChatPromptTemplate.from_messages(
[
("system", "You are an AI Assistant that help human to answer their query based on this given relevant documents:\n{documents}"),
("human", "{query}"),
]
)
docs = get_relevant_documents(query)
prompt = chat_template.format(
documents=format_docs(docs)
query=query
)
answer = llm.invoke(prompt)Enhancements
You can enhance the implementation in several ways:
- Indexing: Consider indexing only the summarized version of each page in the vector database and retrieving the full content later to compare the results.
- Tools: Explore the concept of tools to give the LLM model additional capabilities that could enhance its responses to queries.
- Agents: Utilize LangChain’s agents, where a language model acts as a reasoning engine to determine the appropriate actions to take and the order in which to take them.
- Chains: Implement the concept of chains, which refers to sequences of calls — whether to an LLM, a tool, or a data preprocessing step.
Conclusion
I hope this article has provided you with valuable insights to help you build your first simple RAG application. With the right tools and approach, you can create an effective solution for leveraging LLM technology. Good luck on your development journey!
If you found this article helpful, please don’t forget to give some claps (you can clap more than once!), leave your comments, and share this article with your friends! If you want to read more personal blog posts, please subscribe to my Substack account! Thank you!
