Sitemap

Full Stack Implementation to Build a RAG (Retrieval Augmented Generation) Application

8 min readDec 25, 2023

Introduction:

Retrieval-Augmented Generation (RAG) is an architectural approach that enhances the effectiveness of large language model (LLM) applications by incorporating custom data. It achieves this by retrieving pertinent data or documents related to a specific question or task and utilizing them as contextual information for the LLM. RAG has proven to be successful in supporting chatbots and question-answering systems that require access to current information or domain-specific knowledge. In this blog, we will present a comprehensive solution for developing an RAG application using the following technology stack.

https://rag-delta.vercel.app/

Tech Stack

  1. Database: MongoDB with Atlas vector search, Prisma For ORM
  2. LLM and model: ChaGPT, Huggingface sentence transformer
  3. API: FastAPI
  4. FrontEnd: Next.js with React
  5. UI: Material UI & Semantic UI & Radix UI
  6. Cloud: Google Cloud Platform & Vercel
  7. Tools: LlamaIndex

Architecture Overview:

Backend Github: https://github.com/Nelsonlin0321/webdev-rag-backend-api

Frontend Github: https://github.com/Nelsonlin0321/webdev-nextjs-rag

Backend:

At the backend, ingestion, and Retrieval Generation are two key parts of the RAG application.

Ingestion:

The purpose of the ingestion is to prepare for effective retrieval of relevant information from a given PDF document by transforming the text into meaningful chunks, encoding them into numerical representations, and storing them in a vector search database for efficient retrieval.

The process includes reading a PDF, making chunks, encoding sentences to embeddings, and loading them into a vector search database. It involves the following steps:

  1. Reading a PDF and Making Chunks: We use llamma-index to help us read the pdf and divide the extracted text into smaller, meaningful segments.
  2. Encoding sentences to embeddings: Sentence encoding is the process of converting sentences into fixed-length numerical representations called embeddings. Embeddings capture the semantic meaning of the sentences, enabling comparison and similarity calculations. By encoding sentences to embeddings, we can perform efficient and accurate similarity searches. We leverage sentence_transfomders to encode sentences to embeddings
  3. Loading embeddings into a vector search database: A vector search database is a specialized database that efficiently stores and retrieves vector representations (embeddings) based on their similarity. We use MongoDB Atlas Vector search and create the below vector index:
{
"fields": [
{
"numDimensions": 384,
"path": "embedding",
"similarity": "dotProduct",
"type": "vector"
},
{
"path": "fileName",
"type": "filter"
}
]
}

This is the endpoint to ingest files, process files and load the embedding to the database for a vector search query

@app.post(f"{PREFIX}/ingest")
async def ingest_file(file: UploadFile = File(...)):
try:
if not mongo_db_engine.file_exist(file_name=file.filename):
save_file_path = utils.save_file(file=file)
doc_meta_list = embedding_generator(save_file_path)
mongo_db_engine.insert_embedding(doc_meta_list)
mongo_db_engine.insert_document(file_name=file.filename)
os.remove(save_file_path)

return {"message":
f"The file: {file.filename} has been successfully ingested and processed!"}

# pylint: disable=broad-exception-caught
except Exception as e:
raise HTTPException(status_code=500, detail=str(e)) from e
"""PDT to sentence embedding"""
from sentence_transformers import SentenceTransformer
from llama_index import SimpleDirectoryReader

class PDFToSentenceEmbedding():
def __init__(self):

self.model = SentenceTransformer('BAAI/bge-small-en-v1.5')

def load_document(self, file_path):
documents = SimpleDirectoryReader(
input_files=[file_path]
).load_data()

return documents

def generate_embedding(self, file_path):
documents = self.load_document(file_path)
texts = [doc.text for doc in documents]
embeddings = self.model.encode(texts, normalize_embeddings=True)
document_meta_list = [{"fileName": doc.metadata['file_name'],
"textIdx": idx,
"pageLabel": doc.metadata['page_label'],
"text": doc.text,
"embedding": embeddings[idx].tolist(),
} for idx, doc in enumerate(documents)]
return document_meta_list

def __call__(self, file_path):
document_meta_list = self.generate_embedding(file_path)
return document_meta_list

embedding_generator = PDFToSentenceEmbedding()

Retrieval Generation

The Retrieval Augmented Generation (RAG) in general combines the capabilities of information retrieval and language generation models to enhance the process of generating responses or outputs in a conversational or question-answering system.

The process includes encoding user query to an embedding, information retrieval by vector searching, Prompt Generation, and Question Answering as below.

  1. User Query Embedding: Converting a textual user query into a fixed-length numerical representation that captures its semantic meaning. This process allows for efficient comparison and similarity calculations between queries. Remember to use the same embedding model (BAAI/bge-small-en-v1.5) to encode sentences to embeddings.
query_vector = embedding_generator.model.encode(pay_load.question).tolist()py

2. Information retrieval by vector searching: This process allows for efficient and accurate retrieval of similar or related items. We use MongoDB Atlas Search as a search engine. This is the function of how to perform a vector search after creating a vector search index.

    def vector_search(self, query_vector: List[float], file_name: str) -> List[Dict]:
results = self.db[EMBEDDING_COLLECTION].aggregate([
{

"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"queryVector": query_vector,
"numCandidates": 5,
"limit": 5,

"filter": {"fileName": {"$eq": file_name}}
}

},
{

'$project': {
'embedding': 0,
"_id": 0,
"score": {"$meta": "vectorSearchScore"},
}

}

])

return list(results)

3. Prompt Generation: A prompt is generated using the retrieved sentences from the document. The generated prompt acts as the context for the subsequent question-answering step, providing relevant information to the RAG model.

This is how we generate the prompt after retrieval.

def generate_prompt(retrieved_results):
# pylint:disable=line-too-long
num_retrieved = len(retrieved_results)
retrieved_documents = "\n".join([
f"{idx+1}. Retrieved sentences: {retrieval['text']}" for idx, retrieval in enumerate(retrieved_results)])
prompt = f"Given the top {num_retrieved} retrieved document sentences, please generate a concise and accurate answer to the following question: \n {retrieved_documents}"
return prompt

4. Question Answering: The generated prompt and the user’s question are passed to the OpenAI GPT-3.5 Turbo model using the client.chat.completions.create() method. The RAG model processes the prompt and the user's question to generate an answer. The temperature parameter controls the randomness of the generated answer. A lower value produces more focused and deterministic answers, while a higher value introduces more randomness.

@app.post(f"{PREFIX}/retrieval_generate")
async def retrieval_generate(pay_load: PayLoad):
# try:
query_vector = embedding_generator.model.encode(pay_load.question).tolist()
retrieved_results = mongo_db_engine.vector_search(
query_vector=query_vector, file_name=pay_load.file_name)

prompt = utils.generate_prompt(retrieved_results)

completion = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'system',
'content': prompt,
},
{"role": "user",
"content": pay_load.question}
],
temperature=0,
stream=False
)

return {"question": pay_load.question,
"file_name": pay_load.file_name,
"answer": completion.choices[0].message.content,
"uuid": str(uuid4())}

FrontEnd

In general, the front end is implemented by Next.js with React based on Radix and Semantic UI. At the front end, these are major components:

  1. Upload File
  2. Search File
  3. Question and Answering

1. Upload File

The upload file component will trigger the API api/ingest to ingest, process files, and load document chunk embeddings to vector database.

"use client";
import React, { useState } from "react";
import apiClient from "../services/api-client";
import { AxiosError } from "axios";
import Spinner from "./Spinner";
import toast, { Toaster } from "react-hot-toast";
import { useRouter } from "next/navigation";
import { Button } from "semantic-ui-react";

const FileUploader = () => {
const [file, setFile] = useState<File>();
const [isSubmitting, setSubmitting] = useState(false);
const router = useRouter();

const onSubmit = async (e: React.FormEvent<HTMLFormElement>) => {
e.preventDefault();
if (!file) return;
if (!file.name.endsWith(".pdf")) {
toast.error("Only PDF document supported", { duration: 1000 });
return;
}
setSubmitting(true);
try {
const data = new FormData();
data.set("file", file);
await apiClient.post("/api/ingest", data, {
headers: {
"Content-Type": "multipart/form-data",
},
});
router.refresh();
toast.success("File uploaded successfully!", { duration: 1000 });
} catch (error) {
const response = (error as AxiosError).response?.data;
const message = (response as { message: string }).message;
const errorMessage = message || "File Uploading Failed!";
toast.error(errorMessage, { duration: 1000 });
} finally {
setSubmitting(false);
}
};

return (
<div className="w-full">
<form onSubmit={onSubmit}>
<input
className="mb-2 block w-full cursor-pointer rounded-lg border dark:border-gray-300"
type="file"
name="file"
onChange={(e) => {
if (e.target.files !== null) {
setFile(e.target.files[0]);
}
}}
/>
<Button type="submit" className="cursor-pointer" color="blue">
{isSubmitting ? "Processing File" : "Upload and Process File"}
{isSubmitting && <Spinner />}
</Button>
</form>
<Toaster />
</div>
);
};

export default FileUploader;https://github.com/Nelsonlin0321/webdev-nextjs-rag

2. Search File

Search File is an important component that allows users to select which document they ask to. We implement the search file component using the Material UI Auto Complete component that enables us to search document quickly:

"use client";
import TextField from "@mui/material/TextField";
import Autocomplete from "@mui/material/Autocomplete";
import { InputAdornment } from "@mui/material";
import SearchIcon from "@mui/icons-material/Search";
import { useState } from "react";

interface Props {
fileNames: string[];
setFileName: (fileName: string) => void;
}

const FileSearcher = ({ fileNames, setFileName }: Props) => {
const [value, setValue] = useState<string | null>(fileNames[0]);
const [inputValue, setInputValue] = useState<string | undefined>(
fileNames[0]
);

return (
<Autocomplete
value={value}
onChange={(event: any, newValue: string | null) => {
setValue(newValue);
}}
inputValue={inputValue}
onInputChange={(event: any, newInputValue) => {
setInputValue(newInputValue);
setFileName(newInputValue);
}}
className="rounded-lg border w-full"
disablePortal
id="combo-box-demo"
options={fileNames}
renderInput={(params) => (
<TextField
size="small"
InputLabelProps={params.InputLabelProps}
InputProps={{
...params.InputProps,
startAdornment: (
<InputAdornment position="start">
<SearchIcon />
Search PDF
</InputAdornment>
),
}}
id={params.id}
inputProps={params.inputProps}
fullWidth={params.fullWidth}
/>
)}
/>
);
};

export default FileSearcher;

3. Question and Answering

This component involves question-asking and chat history maintenance. In the question-asking component, we will trigger api/retrieval-generate API to update the chat records state.

"use client";
import { TextField } from "@radix-ui/themes";
import { useRef, useState } from "react";
import { Button } from "semantic-ui-react";
import toast, { Toaster } from "react-hot-toast";
import apiClient from "../services/api-client";
import { chatRecord } from "./Chatbot";
// import { AxiosError } from "axios";
import Spinner from "./Spinner";

interface Props {
fileName: string;
fileNames: string[];
chatRecords: chatRecord[];
setChatRecords: (records: chatRecord[]) => void;
}

const QuestionField = ({
fileName,
fileNames,
chatRecords,
setChatRecords,
}: Props) => {
const questionRef = useRef<HTMLInputElement>(null);
const [isLoading, setLoading] = useState(false);
const submitData = { question: "", file_name: fileName };

return (
<div className="w-full">
<form
onSubmit={async (event) => {
event.preventDefault();

if (questionRef.current != null) {
if (questionRef.current.value.split(" ").length < 3) {
toast.error("The question requires at least 3 words");
return;
}

if (!fileNames.includes(fileName)) {
toast.error("The selected PDF document doesn't exist!");
return;
}
submitData.question = questionRef.current.value;
setLoading(true);
try {
await apiClient
.post<chatRecord>("/api/retrieval_generate", submitData)
.then((res) => {
setChatRecords([res.data, ...chatRecords]);
setLoading(false);
});
} catch (error) {
// const response = (error as AxiosError).response?.data;
// const message = (response as { message: string }).message;
// const errorMessage = message || "Unexpected Error";
const errorMessage = "Unexpected Error";
toast.error(errorMessage, { duration: 1000 });
} finally {
setLoading(false);
}
}
}}
>
<div>
<span>
Please write down your question related to selected PDF Document
</span>
<TextField.Root className="mb-2">
<TextField.Input
placeholder="Example Question: What are steps to take when finding projects to build your AI experience ?"
ref={questionRef}
/>
</TextField.Root>
<Button type="submit" className="cursor-pointer" color="blue">
Ask
{isLoading && <Spinner />}
</Button>
</div>
</form>
<Toaster />
</div>
);
};

export default QuestionField;

To render the chat history, we use the Semantic UI Accordion component instead of using the chatbot conversation style because the answer could be very long. Accordion enables use to collapse the long answer with a document tag.

"use client";
import { useEffect, useState } from "react";
import { Accordion, Icon, Label } from "semantic-ui-react";
import { chatRecord } from "./Chatbot";
import { Text } from "@radix-ui/themes";

interface Props {
chatRecords: chatRecord[];
}

const ChatHistory = ({ chatRecords }: Props) => {
const [activeIndex, setActiveIndex] = useState(0);

useEffect(() => {
setActiveIndex(0);
}, [chatRecords]);

return (
<Accordion fluid styled>
{chatRecords.map((message, index) => (
<div key={index}>
<Accordion.Title
active={activeIndex === index}
onClick={() => {
if (activeIndex == index) {
setActiveIndex(-1);
} else {
setActiveIndex(index);
}
}}
>
<Icon name="dropdown" />
{message.question}
<div>
<Label color="orange" ribbon="right">
<p style={{ maxWidth: 256 }} className="truncate">
{message.file_name}
</p>
</Label>
</div>
</Accordion.Title>

<Accordion.Content active={activeIndex === index}>
<Text className="text-gray-800 mb-4 whitespace-pre-line">
{message.answer}
</Text>
</Accordion.Content>
</div>
))}
</Accordion>
);
};

export default ChatHistory;

--

--

Nelson LIN
Nelson LIN

Responses (1)