Implementation of Retrieval-Augmented Generation (RAG) with Amazon Bedrock

In the AI landscape, Large Language Models (LLMs) excel in complex tasks but often struggle with generating accurate responses from incomplete information. This challenge brings us to Retrieval-Augmented Generation (RAG), an approach that combines LLMs with external data sources to enhance response accuracy. By integrating user prompts with data retrieval from structured knowledge bases, RAG ensures more reliable and informed outputs from LLMs.

In this implementation, we’ve adapted an example from the Amazon Bedrock Workshops, demonstrating RAG’s capabilities with an in-memory FAISS database. This method serves as an essential base for more advanced applications, which might employ persistent datastores such as Amazon Kendra or Amazon OpenSearch Serverless for broader real-world uses.

Architecture of this demo (source:https://catalog.workshops.aws/building-with-amazon-bedrock/en-US/basic/bedrock-rag#architecture)

By extending the workshop’s example, we showcase how RAG can effectively overcome common LLM challenges like misinformation and outdated responses, thereby making LLMs more reliable and contextually aware. This implementation not only underlines the importance of RAG in AI applications but also serves as a practical guide for leveraging Amazon Bedrock’s capabilities in enhancing LLM performance.

Data Preparation: Crawling AWS Documentation for RAG

In the initial phase of our project, we focused on gathering the necessary data to feed into the RAG model. This task involved creating a custom web crawler using Python to scrape AWS official documentation. The objective was to accumulate a rich dataset that could provide a solid foundation for our RAG system to generate informed responses.

Our journey began with the “What is Amazon S3?” page, which serves as an overview of this product’s documentation. We collected all URLs related to the documentation from this page and archived the data within them. Initially, we worked with over 800 text documents, but due to the extended time required to generate the vector database, we streamlined the process to focus on around a dozen key documents about S3.

import os
import requests
from bs4 import BeautifulSoup

def doc_to_txt(url):
"""
Extracts the title, last updated timestamp, and text from the given URL and formats it as a text content.
The function fetches the content using requests and parses it using BeautifulSoup.
It extracts text from h2, h3, h4, h5, h6, and p tags, and looks for a 'Last updated' timestamp.

:param url: The URL to extract the text from.
:return: Formatted text content with title, last updated timestamp, and URL.
"""
response = requests.get(url)
content = response.content
soup = BeautifulSoup(content, 'html.parser')
title = soup.title.string if soup.title else "No Title"

# Extracting the last updated timestamp
last_updated = ""
for tag in soup.find_all(['p', 'div']):
if "Last updated" in tag.get_text():
last_updated = tag.get_text().strip()
break

useful_text = ""
for tag in soup.find_all(['h2', 'h3', 'h4', 'h5', 'h6', 'p']):
useful_text += tag.get_text() + "\n"

meta_data = f"Title: {title}\nURL: {url}\n{last_updated}\n" if last_updated else f"Title: {title}\nURL: {url}\n"
txt_content = f"{meta_data}\n{useful_text}"
return txt_content

def extract_docs_urls(url, path_keyword):
"""
Extracts all URLs that contain a specified path keyword from the given URL.
The function fetches the content using requests and parses it using BeautifulSoup.
It searches for all 'a' tags with href attributes and filters URLs containing the path keyword.

:param url: The URL to search for other URLs.
:param path_keyword: The keyword that should be present in the path of the URL.
:return: A set of URLs that contain the specified path keyword.
"""
try:
response = requests.get(url)
content = response.content
soup = BeautifulSoup(content, 'html.parser')
urls = {a['href'] for a in soup.find_all('a', href=True) if path_keyword in a['href']}
return urls
except Exception as e:
print(f"Error processing URL: {url}\n{e}\n")
return set()

def save_texts_to_folder(urls, folder_name):
"""
Saves extracted text content from a list of URLs into text files within a specified folder.
"""
if not os.path.exists(folder_name):
os.makedirs(folder_name)

for url in urls:
try:
txt_content = doc_to_txt(url)
file_name = os.path.join(folder_name, url.split('/')[-1].split('?')[0] + ".txt")
with open(file_name, 'w', encoding='utf-8') as file:
file.write(txt_content)
except Exception as e:
print(f"Error processing URL: {url}\n{e}\n")

def main():
main_page_url = "https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html"
folder_name = "/home/ubuntu/environment/document_source"

# Step 1: Collect URLs from the main page
urls = extract_docs_urls(main_page_url, "/userguide/")

# Step 2: Process and save texts
save_texts_to_folder(urls, folder_name)

if __name__ == "__main__":
main()

This approach not only made our data collection more manageable but also ensured that the RAG model was built upon highly relevant and specific content, enhancing its effectiveness and efficiency in generating responses.

Documents list for RAG

Preparing Data for LLM Retrieval

Building on the Amazon Bedrock Workshop’s RAG pattern, we adapted their example code to fit our specific use case. Our approach involved the creation of a custom retrieval and indexing system, tailored to process and handle the specific data we had collected from the AWS documentation.

In our implementation, the key differences from the original example lie in the source and format of our data. Instead of using a single PDF file as in the workshop’s example, we opted for a broader range of data sources, scraping numerous AWS documentation pages and storing them as text files. This approach allowed us to cover a wider spectrum of information, ensuring our RAG model had access to a more diverse and comprehensive set of data.

import os
from langchain.embeddings import BedrockEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms.bedrock import Bedrock
from langchain_community.document_loaders import TextLoader

def get_llm():
model_kwargs = {
"maxTokens": 4096,
"temperature": 0.6,
"topP": 0.6,
"stopSequences": [],
"countPenalty": {"scale": 0 },
"presencePenalty": {"scale": 0 },
"frequencyPenalty": {"scale": 0 }
}

llm = Bedrock(
credentials_profile_name=os.environ.get("BWB_PROFILE_NAME"),
region_name=os.environ.get("BWB_REGION_NAME"),
endpoint_url=os.environ.get("BWB_ENDPOINT_URL"),
model_id="ai21.j2-ultra-v1",
model_kwargs=model_kwargs)

return llm

def get_index():
embeddings = BedrockEmbeddings(
credentials_profile_name=os.environ.get("BWB_PROFILE_NAME"),
region_name=os.environ.get("BWB_REGION_NAME"),
endpoint_url=os.environ.get("BWB_ENDPOINT_URL"),
)

text_splitter = RecursiveCharacterTextSplitter(
separators=["\n\n", "\n", ".", " "],
chunk_size=1000,
chunk_overlap=100
)

index_creator = VectorstoreIndexCreator(
vectorstore_cls=FAISS,
embedding=embeddings,
text_splitter=text_splitter,
)

text_files_path = "/home/ubuntu/environment/document_source"
loaders = []
for filename in os.listdir(text_files_path):
if filename.endswith(".txt"):
file_path = os.path.join(text_files_path, filename)
loader = TextLoader(file_path=file_path)
loaders.append(loader)

index_from_loaders = index_creator.from_loaders(loaders)

return index_from_loaders

def get_rag_response(index, question):
llm = get_llm()
response_text = index.query(question=question, llm=llm)

return response_text

Here’s a high-level overview of our code flow:

  1. LLM Initialization: We set up a Bedrock LLM with customized parameters, increasing maxTokens to 4096 and adjusting temperature and topP to 0.6 for a balance between creativity and coherence in the model's responses.
  2. Data Indexing: Using LangChain’s TextLoader, we processed the scraped text files and created an in-memory FAISS vector store. This store acts as the database from which the RAG model retrieves information.
  3. RAG Response: Upon receiving a query, our system queries this indexed data, combining the retrieved information with the LLM’s capabilities to generate a well-informed response.

By modifying the original setup to accommodate a more extensive dataset, we enhanced the model’s ability to provide accurate and contextually relevant answers. This customization exemplifies the flexibility of Amazon Bedrock’s tools, showcasing their adaptability to different data sources and formats.

Creating an Interactive Interface with Streamlit for RAG

The final step in our RAG implementation involved developing an interactive interface using Streamlit. This interface is designed to allow users to interact with the RAG model, posing questions and receiving informed responses. Here’s how we achieved this:


import streamlit as st #all streamlit commands will be available through the "st" alias
import rag_lib as glib #reference to local lib script

st.set_page_config(page_title="Retrieval-Augmented Generation") #HTML title
st.title("Retrieval-Augmented Generation") #page title

if 'vector_index' not in st.session_state: #see if the vector index hasn't been created yet
with st.spinner("Indexing document..."): #show a spinner while the code in this with block runs
st.session_state.vector_index = glib.get_index() #retrieve the index through the supporting library and store in the app's session cache

input_text = st.text_area("Input text", label_visibility="collapsed") #display a multiline text box with no label
go_button = st.button("Go", type="primary") #display a primary button

if go_button: #code in this if block will be run when the button is clicked

with st.spinner("Working..."): #show a spinner while the code in this with block runs
response_content = glib.get_rag_response(index=st.session_state.vector_index, question=input_text) #call the model through the supporting library

st.write(response_content) #display the response content
  1. Setting Up Streamlit: We initiated our Streamlit application with basic configurations, including setting the page title to “Retrieval-Augmented Generation.”
  2. Indexing Documents: On the application’s first run, we check if the vector index is already created in the session state. If not, we use a spinner to indicate the process of indexing documents using our custom library rag_lib.
  3. User Input and Response Generation: The main feature of our interface is a text area for users to input their queries. Upon clicking the “Go” button, the application interacts with the RAG model to generate responses. This is done by querying the indexed data stored in the session state and utilizing our get_rag_response function.
  4. Displaying Responses: The response from the RAG model is then displayed on the Streamlit interface, providing users with answers based on the indexed AWS documentation data.

This interactive application serves as a practical demonstration of the RAG model’s capabilities, showcasing how users can leverage it for retrieving accurate and contextually relevant information.

Demo the RAG Model:

In this section, we explore a live demonstration of our Retrieval-Augmented Generation model through a Streamlit interface. The demonstration offers a tangible view of how the RAG model functions in real-world scenarios.

Previewing the Application:

To begin, we preview our application in AWS Cloud9 by selecting Preview -> Preview Running Application. This launches the Streamlit interface, showcasing the initial setup of our RAG model.

How to preview running application(source:https://catalog.workshops.aws/building-with-amazon-bedrock/en-US/basic/bedrock-rag)

Initialization Time Variance:

The initialization time varies based on the number of documents indexed. As previously mentioned, processing 800 documents proved to be time-consuming, leading us to optimize the number for efficiency.

Interface Walkthrough:

The user is greeted with a simple and intuitive interface. To test the versatility of the model, we start with an unrelated question, “How are you?”. The model, focused on factual data, responds with “I don’t know,” indicating its design to handle specific, information-based queries.

Querying Specific Information:

Next, we pose a question directly related to AWS S3: “What is Compliance mode in S3?”. The response aligns accurately with the information in our indexed documents, demonstrating the model’s ability to retrieve and utilize specific data effectively.

Identifying Areas for Improvement:

To test the limits of our model, we ask a more detailed question about S3: “Bucket names requirements and rules.” The response, while relevant, isn’t entirely comprehensive, indicating areas where the model could be enhanced for more detailed inquiries.

Conclusion

Implementing Retrieval-Augmented Generation (RAG) on Amazon Bedrock proved to be a straightforward endeavor, largely thanks to the comprehensive resources provided by Amazon’s workshops. While the initial setup was user-friendly, bridging the gap between a functional prototype and a fully practical application remains a challenge. This journey has highlighted the need for continued development and fine-tuning to fully harness the potential of RAG models in real-world scenarios.

Reference

--

--