Why your next AI product needs RAG implemented in it

Showcasing Retrieval Augmented Generation (RAG) for chatbots and a step-by-step tutorial on how to build one for yourself or others

Published in

Databutton

11 min readOct 26, 2023

Schematic Illustration of the key components of RAG based Chatbots

Chatbots, such as the very popular ChatGPT, use large language models (LLMs) like GPT to generate responses. Hence, they can easily answer our burning questions using the data they have been trained on. To me, it feels like they are more like the digital encyclopedias, pulling informations from a vast knowledge they have soaked up.

In fact, they are pretty handy in helping with everything — from generating food recipes, planning trips, to untangling tricky math problems.

If you are a developer, already building such chatbots

Leveraging LLMs aren’t that difficult as well. Please refer to my earlier blog posts below for tutorials and app demos…

Build Your Own Chatbot 🤖 with openAI GPT-3 and Streamlit 🎈

Creating a Chatbot Has Never Been Easier with GPT-3 and Streamlit

medium.com

How to build a Chatbot with ChatGPT API and a Conversational Memory in Python

🧠 Memory Bot 🤖 — An easy up-to-date implementation of ChatGPT API, the GPT-3.5-Turbo model, with LangChain AI's 🦜 —…

medium.com

Let’s recap an over-simplified workflow for building simple chatbots

**An over simplified architecture of any Chat-GPT like Chatbot (we call this a Simple Chatbot )**

The end-user sends their queries (i.e. prompts)
The query is passed and processed by the LLM (i.e. the pre-trained knowledge base), under the hood both the end-user’s prompt and system prompt get embedded.
Finally, a well crafted response is generated and returned

Regular use of ChatGPT has highlighted the importance of crafting our prompts. A well-framed prompt yields a more accurate response from the LLM. In a previous post (linked below), I have demonstrated how we can even visualise data and create plots just by asking the right prompts.

Is Prompting is the next programming?

AI-Powered Data Visualization

Introducing an app to generate charts using only a single prompt 📊

medium.com

There’s an elephant in the room we can’t ignore any longer!

These simple chatbots have two major drawbacks :

Limited to trained information: Bots like ChatGPT are not trained with new information after a certain point. So, they might not be aware of the latest happenings in the world.

ChatGPT plugins like Bing Search, can help to overcome such limitations

Makes things up or hallucinates: Sometimes, if unsure, LLMs can say things that sound true but aren’t accurate. They might fill in gaps with irrelevant responses. And often they tend to hallucinate.

To overcome such limitations and drawbacks, Retrieval Augmented Generation (RAG) enhanced chatbots becomes super powerful!

Below is an insightful article by MetaAI researchers who first showcased these perspectives.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve…

arxiv.org

I think it is worth sharing the main concept of RAG from the article’s abstract to provide a clearer and wider understanding:

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures.
Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems.
Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks.
We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) — models which combine pre-trained parametric and non-parametric memory for language generation.

This article motivated me to build and compare basic chatbots with those enhanced by RAG…

Comparing Simple Chatbots vs “Rag-enabled” Chatbots. When “Rag enabled” toggle switch is on, the Chatbot could answer the end-users query!

In a RAG-based chatbot implementation, our workflow requires few additional steps:

User Interaction / Prompt usage: Like a simple chatbot, in our RAG based chatbot, the end-user needs to submit a query.
Orchestration of Prompt / Prompt template: Implementation of a related conversation history or adding more context (this step also comes later while augmenting the prompt with context).
Retrieval / Pulling data from an external knowledge base: Before sending the prompt to the LLM, the system consults with retrieval tools. These tools often include knowledge bases and APIs. For instance, Wikipedia or vector Datbases like Pinecone or Weaviate. The retrievers aim to obtain context from the knowledge base.
LLM Processing: Having the added context via the retrieval tools, the prompt is now aided with added context. And finally this prompt (user prompt + System Prompt + Context) is sent to the LLM.
Response Generation: The LLM, now informed with a better and informative prompt, crafts a relevant and informed response.

Thus, this approach using RAG enables LLMs to deliver precise and current information, even if their base training data does not change.

🤖 Build a chatbot with RAG pipeline

Now that we have an overall idea on the key aspects of RAG based chatbots — let’s try to build and deploy one! We will be using LangChain and Databutton for building this chatbot.

A big shoutout to open-source platforms like LangChain and LlamaIndex — they have immensely simplified the orchestration layer and the integration with LLMs through the suite of tools that they offer.

🧠 Brain of our app — External Knowledge bases

Since our external data sources are PDF files from end-users, let’s start by writing few functions to ingest that data.

# Importing the modules necessary 
import databutton as db
import streamlit as st

import re
import time
from io import BytesIO
from typing import Any, Dict, List
import pickle

from langchain.docstore.document import Document
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from pypdf import PdfReader
import faiss

def parse_pdf(file: BytesIO, filename: str) -> Tuple[List[str], str]:
    # Initialize the PDF reader for the provided file.
    pdf = PdfReader(file)
    output = []
    
    # Loop through all the pages in the PDF.
    for page in pdf.pages:
        # Extract the text from the page.
        text = page.extract_text()
        
        # Replace word splits that are split by hyphens at the end of a line.
        text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text)
        
        # Replace single newlines with spaces, but not those flanked by spaces.
        text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip())
        
        # Consolidate multiple newlines to two newlines.
        text = re.sub(r"\n\s*\n", "\n\n", text)
        
        # Append the cleaned text to the output list.
        output.append(text)
    
    # Return the list of cleaned texts and the filename.
    return output, filename

def text_to_docs(text: List[str], filename: str) -> List[Document]:
    # Ensure the input text is a list. If it's a string, convert it to a list.
    if isinstance(text, str):
        text = [text]
    
    # Convert each text (from a page) to a Document object.
    page_docs = [Document(page_content=page) for page in text]
    
    # Assign a page number to the metadata of each document.
    for i, doc in enumerate(page_docs):
        doc.metadata["page"] = i + 1

    doc_chunks = []
    
    # Split each page's text into smaller chunks and store them as separate documents.
    for doc in page_docs:
        # Initialize the text splitter with specific chunk sizes and delimiters.
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=4000,
            separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
            chunk_overlap=0,
        )
        
        # Split the document's text into chunks.
        chunks = text_splitter.split_text(doc.page_content)
        
        # Convert each chunk into a new document, storing its chunk number, page number, and source file name in its metadata.
        for i, chunk in enumerate(chunks):
            doc = Document(
                page_content=chunk, metadata={"page": doc.metadata["page"], "chunk": i}
            )
            doc.metadata["source"] = f"{doc.metadata['page']}-{doc.metadata['chunk']}"
            doc.metadata["filename"] = filename
            doc_chunks.append(doc)
    
    # Return the list of chunked documents.
    return doc_chunks

We will parse each of the uploaded PDFs, split the text, and chunk them to create a list of documents. Note: we ensure that all the information of the metadata is well retained.

🛠️ Indexing is crucial while working with LLMs

A vector database does not store and work directly with text, hence it is important to convert texts to vectorised form. This step is often referred to as applying embeddings— this captures the semantic and contextual information of the data. We are using the FAISS Python package to perform this step.

def docs_to_index(docs, openai_api_key):
    index = FAISS.from_documents(docs, OpenAIEmbeddings(openai_api_key=openai_api_key))
    return index


def get_index_for_pdf(pdf_files, pdf_names, openai_api_key):
    documents = []
    for pdf_file, pdf_name in zip(pdf_files, pdf_names):
        text, filename = parse_pdf(BytesIO(pdf_file), pdf_name)
        documents = documents + text_to_docs(text, filename)
    index = docs_to_index(documents, openai_api_key)
    return index

Here’s where I discuss embeddings & semantic search in detail

Build a Personal Search Engine Web App using Open AI Text Embeddings

Create a semantic search engine using Open AI embeddings and models powered by Databutton — an all-in-one Python…

medium.com

The best practice is to store embeddings is in a vector database. A vector database is immensely powerful and easy to work while detailing vectorized data. Popular vector stores are Pinecone or Weaviate.

🤩 Building the front-end

Let’s simultaneously build the front-end part where we would typically allow our end-user to upload any PDF file, index it, and finally chat with it!

# Import necessary libraries
import databutton as db
import streamlit as st
import openai
from my_pdf_lib import get_index_for_pdf
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
import os

# Set the title for the Streamlit app
st.title("RAG enhanced Chatbot")

# Set up the OpenAI API key from databutton secrets
os.environ["OPENAI_API_KEY"] = db.secrets.get("OPENAI_API_KEY")
openai.api_key = db.secrets.get("OPENAI_API_KEY")

# Upload PDF files using Streamlit's file uploader
pdf_files = st.file_uploader("", type="pdf", accept_multiple_files=True)

Next, we need to write a function which would create a vector database based on the content of the uploaded PDF files, index them, and store it as a session state. However, I would highly recommend to use a vector database to store such vector embeddings.


# Cached function to create a vectordb for the provided PDF files
@st.cache_data
def create_vectordb(files, filenames):
    # Show a spinner while creating the vectordb
    with st.spinner("Vector database"):
        vectordb = get_index_for_pdf(
            [file.getvalue() for file in files], filenames, openai.api_key
        )
    return vectordb


# If PDF files are uploaded, create the vectordb and store it in the session state
if pdf_files:
    pdf_file_names = [file.name for file in pdf_files]
    st.session_state["vectordb"] = create_vectordb(pdf_files, pdf_file_names)

Below is a schematic representation to illustrate that whenever a prompt comes from the end-user — the system would first interact with external databases such as vector databases instead of passing directly via the LLM.

A prompt will first interact with external databases instead of passing directly via the LLM

Based on prior discussions in this blog and also the above schematic, we would like the system to first interact with external databases. Hence, writing a well-crafted customised prompt which is designed to take in further context (i.e. we are augmenting our prompt here) is crucial!

# Define the template for the chatbot prompt
prompt_template = """
    You are a helpful Assistant who answers to users questions based on multiple contexts given to you.

    Keep your answer short and to the point.
    
    The evidence are the context of the pdf extract with metadata. 
    
    Carefully focus on the metadata specially 'filename' and 'page' whenever answering.
    
    Make sure to add filename and page number at the end of sentence you are citing to.
        
    Reply "Not applicable" if text is irrelevant.
     
    The PDF content is:
    {pdf_extract}
"""

Note: The above prompt is not robust or well tested and is solely crafted for this demo app. The prompt can be further tweaked and tested (please leave your suggestions in the comment section below if you have better prompts)

💬 Building the Chat UI

This is a typical Streamlit ChatUI which we will use for this chatbot.

# Get the current prompt from the session state or set a default value
prompt = st.session_state.get("prompt", [{"role": "system", "content": "none"}])

# Display previous chat messages
for message in prompt:
    if message["role"] != "system":
        with st.chat_message(message["role"]):
            st.write(message["content"])

# Get the user's question using Streamlit's chat input
question = st.chat_input("Ask anything")

# Handle the user's question
if question:
    vectordb = st.session_state.get("vectordb", None)
    if not vectordb:
        with st.message("assistant"):
            st.write("You need to provide a PDF")
            st.stop()

For understanding each step better, refer to my previous blogs!

Stream LangChain AI Abstractions and Responses in Your Web App — LangChain Tools in Action

Building an Internet-Connected Chat Assistant— powered by LangChain Agents, Databutton and Streamlit Chat Elements

medium.com

⚙️ Retrieving the semantically similar contexts from Index Store

Fetching relevant contexts to augment our prompt! This part is very crucial in our RAG enhanced chatbot. When the user passes the query, we want to ensure that we get the top N number of semantically similar hits from our vectorized data.

# Search the vectordb for similar content to the user's question
    search_results = vectordb.similarity_search(question, k=3)
    #search_results

This increases the relevancy of responses a simple chatbot lacks

The first N number of semantically relevant searches are taken as a context for enriching our prompt!

The similarity searches contain all the relevant information from source to semantically similar context!

➕ Augmenting the semantically relevant context with the prompt

We loop over all the list of search_results and concatenate them in a single string, which will later be passed to the prompt.

pdf_extract = "/n ".join([result.page_content for result in search_results])

# Update the prompt with the pdf extract
prompt[0] = {
        "role": "system",
        "content": prompt_template.format(pdf_extract=pdf_extract),
    }

🌊 Generating Responses 🤖

Next, we pass the prompt and multiple contexts back to the LLM to generate a relevant answer based on the end-user query. Also, we stream the generated responses from the LLM to give a Chat-GPT like vibe!

 # Add the user's question to the prompt and display it
    prompt.append({"role": "user", "content": question})
    with st.chat_message("user"):
        st.write(question)

    # Display an empty assistant message while waiting for the response
    with st.chat_message("assistant"):
        botmsg = st.empty()

    # Call ChatGPT with streaming and display the response as it comes
    response = []
    result = ""
    for chunk in openai.ChatCompletion.create(
        model="gpt-3.5-turbo", messages=prompt, stream=True
    ):
        text = chunk.choices[0].get("delta", {}).get("content")
        if text is not None:
            response.append(text)
            result = "".join(response).strip()
            botmsg.write(result)

    # Add the assistant's response to the prompt
    prompt.append({"role": "assistant", "content": result})

    # Store the updated prompt in the session state
    st.session_state["prompt"] = prompt
    prompt.append({"role": "assistant", "content": result})

    # Store the updated prompt in the session state
    st.session_state["prompt"] = prompt

Feel the RAG chatbot building vibes

☕️ Conclusion

Congratulations! Together we built a chatbot enhanced with RAG (Retrieval-Augmentation-Generation) 🎉

In brief, typically the building blocks of such RAG based chatbots include:

a) Data retrieval from vectorized user information

b) Context-based augmentation of prompts based on end-users queries

c) Generation of more reliable responses to end-user queries

Integrating such RAG based approaches for customised LLM based products increases the chance of getting more context-relevant and precise information, as well as ensuring that the responses are tailored to specific user queries. All-in-all just a better chatbot experience.

Check out this informative piece by Trygve Karper, where he discusses a handful of YC startups leveraging the power of RAG — https://medium.com/databutton/some-ycombinator-rag-startups-cba3cca88274

*A happy customer is the heartbeat of every thriving product*. (Photo: Cytonn Photography)

To get started quickly, you can use the “Chat with PDF” template within Databutton 🚀

Alternatively, you can find the entire code in this repo: https://github.com/avrabyt/RAG-Chatbot/tree/main

Also explained in details over this video :
https://youtu.be/Yh1GEWqgkt0?si=p-gu9CBl4GTK4ESx

📖 Hungry for more? I would urge you to read the following!

LangChain RAG document — https://python.langchain.com/docs/expression_language/cookbook/retrieval
LangChain Blog on RAG — https://deci.ai/blog/retrieval-augmented-generation-using-langchain/
Retrieval augmented generation: Keeping LLMs relevant and current — https://stackoverflow.blog/2023/10/18/retrieval-augmented-generation-keeping-llms-relevant-and-current/
LamaIndex RAG concepts — https://gpt-index.readthedocs.io/en/latest/getting_started/concepts.html
What is RAG from IBM — https://research.ibm.com/blog/retrieval-augmented-generation-RAG
RAG based YC Startups — https://medium.com/databutton/some-ycombinator-rag-startups-cba3cca88274

Acknowledgement

Thanks to Björn Lapakko for proof reading and candid feedbacks 💜

Join my weekly Newsletter

https://weekly-aistacks.beehiiv.com/subscribe

Why your next AI product needs RAG implemented in it

Showcasing Retrieval Augmented Generation (RAG) for chatbots and a step-by-step tutorial on how to build one for yourself or others

If you are a developer, already building such chatbots

Build Your Own Chatbot 🤖 with openAI GPT-3 and Streamlit 🎈

Creating a Chatbot Has Never Been Easier with GPT-3 and Streamlit

How to build a Chatbot with ChatGPT API and a Conversational Memory in Python

🧠 Memory Bot 🤖 — An easy up-to-date implementation of ChatGPT API, the GPT-3.5-Turbo model, with LangChain AI's 🦜 —…

Let’s recap an over-simplified workflow for building simple chatbots

AI-Powered Data Visualization

Introducing an app to generate charts using only a single prompt 📊

There’s an elephant in the room we can’t ignore any longer!

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve…

This article motivated me to build and compare basic chatbots with those enhanced by RAG…

In a RAG-based chatbot implementation, our workflow requires few additional steps:

🤖 Build a chatbot with RAG pipeline

🧠 Brain of our app — External Knowledge bases

🛠️ Indexing is crucial while working with LLMs

Here’s where I discuss embeddings & semantic search in detail

Build a Personal Search Engine Web App using Open AI Text Embeddings

Create a semantic search engine using Open AI embeddings and models powered by Databutton — an all-in-one Python…

🤩 Building the front-end

💬 Building the Chat UI

For understanding each step better, refer to my previous blogs!

Stream LangChain AI Abstractions and Responses in Your Web App — LangChain Tools in Action

Building an Internet-Connected Chat Assistant— powered by LangChain Agents, Databutton and Streamlit Chat Elements

⚙️ Retrieving the semantically similar contexts from Index Store

This increases the relevancy of responses a simple chatbot lacks

➕ Augmenting the semantically relevant context with the prompt

🌊 Generating Responses 🤖

☕️ Conclusion

📖 Hungry for more? I would urge you to read the following!

Acknowledgement

Join my weekly Newsletter

Written by Avra