Building a RAG solution using Amazon Bedrock and Amazon OpenSearch Serverless

Tahir Saeed
6 min readFeb 21, 2024

--

Since the introduction of ChatGPT, most organizations are keen on incorporating a ChatGPT like experience into their solutions. However, for confidentiality reasons, it is not advisable to plug your data into public solutions. Furthermore, the publicly available solutions might hallucinate and not provide the right answers based on your organization’s knowledge base.

Retrieval-Augmented Generation (RAG) is the answer to confining Generative AI’s scope to specific organizational domains or knowledge bases. RAG enhances the capabilities of LLMs by integrating external knowledge bases, ensuring the relevance and accuracy of generated content. This approach offers a cost-effective means of augmenting Generative AI’s output, rendering it contextually pertinent and reliable.

In this article we will explore how to build a solution that can scrape data from your website and enable search capabilities on your website using AWS services like Amazon Bedrock and Amazon OpenSearch Serverless. By the end of this blog you should be able to plug in your website, scrape data from it, be able to ask questions in natural language and get answers back. So let’s get started.

This blog assumes you have an understanding of how to use the AWS console. If you do not have a AWS account. Please create one here

Firstly it is important to understand how the whole solution works. Following is a diagram that depicts the flow of the solution:

RAG Architecture

Infrastructure

  1. We will use Jupyter notebook to run the code. Instructions on how to download and setup Jupyter Labs on your machine are available here
  2. The next step is to setup Amazon OpenSearch Serverless Vector Engine
  3. Login to AWS and navigate to Amazon OpenSearch Service

4. Navigate to Collections tab under ‘Serverless’ section from the left menu

5. Click on the ‘Create Collection’ button and a form should open up

6. Fill in the collection name and select ‘Vector search’. Leave the remaining fields as default and hit ‘Next’ at the bottom of the form

7. Next screen is to review and create the collection. Hit ‘Submit’ after reviewing the details

8. Once the collection is created. Navigate to the newly created collection and scroll down to the Endpoint section. Copy the OpenSearch endpoint url and save it for later use

9. Navigate to the ‘Indexes’ tab

9. Click on ‘Create vector index’

10. Navigate to the ‘JSON’ tab on the Create vector index page

11. Enter your index name ‘rag-index’ and copy the following json in the text box provided and click the ‘Create’ button

{
"mappings": {
"properties": {
"rag_vector": {
"type": "knn_vector",
"dimension": 1536,
"method": {
"engine": "faiss",
"name": "hnsw",
"space_type": "l2"
}
}
}
}
}

12. Now, let’s enable Amazon Bedrock model access. Navigate to Amazon Bedrock

13. Navigate to ‘Model access’ from the left panel

14. Click on ‘Manage model access’ button in the top right

15. Select ‘Titan Embeddings G1 — Text’ and ‘Titan G1 — Express’ and hit ‘Save changes’ button at the bottom

Data Ingestion

16. Create AWS access keys for your user and configure aws cli

17. Launch Jupyter lab and open a new notebook

18. For this example we are going to use Amazon Titan Embedding G1-Text model to create the embeddings. Add a new cell in the jupyter notebook and run the following code to initiate the Titan embedding model from Bedrock

%pip install --upgrade --quiet langchain-community

from langchain_community.embeddings import BedrockEmbeddings

embedding = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1")

19. Next step is to initiate the AWS credentials. Run the following code in a new cell to do that

%pip install --upgrade --quiet boto3 requests_aws4auth

import boto3
from requests_aws4auth import AWS4Auth

service = "aoss"
credentials = boto3.Session().get_credentials()
region = boto3.Session().region_name
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

20. Next is to initiate the OpenSearch Vector DB. Please note OpenSearch endpoint and OpenSearch index is coming from bullet 8 and 11 respectively

%pip install --upgrade --quiet langchain-community opensearch-py

import os
from opensearchpy import RequestsHttpConnection, OpenSearch
from langchain_community.vectorstores import OpenSearchVectorSearch

opensearch_domain_endpoint = os.environ['OPENSEARCH_ENDPOINT']
opensearch_index = os.environ['OPENSEARCH_INDEX']

vector = OpenSearchVectorSearch(
embedding_function = embedding,
index_name = opensearch_index,
http_auth = awsauth,
use_ssl = True,
verify_certs = True,
http_compress = True, # enables gzip compression for request bodies
connection_class = RequestsHttpConnection,
opensearch_url=opensearch_domain_endpoint
)

21. The last step in the data ingestion process is to scrape the website and load it into the OpenSearch Vector data store

%pip install --upgrade --quiet langchain langchain-community bs4

import bs4
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader

bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader("https://medium.com/@tahir.saeed_46137/building-a-rag-solution-using-amazon-bedrock-and-amazon-opensearch-serverless-7e6f7f4f98dd")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
splits = text_splitter.split_documents(docs)
vector.add_documents(
documents = splits,
vector_field = "rag_vector"
)

Query

22. Now that the data is ingested and the vector database is ready, let’s try to do a similarity search on the vector store

question = "What Amazon services are used in this tutorial"
results = vector.similarity_search(
question,
vector_field="rag_vector",
text_field="text",
metadata_field="metadata",
)

rr = [{"page_content": r.page_content, "metadata": r.metadata} for r in results]
data = ""
for doc in rr:
data += doc['page_content'] + "\n\n"

23. We will pass in the results from the vector store along with the original question to the Amazon Titan Text Express model using the Bedrock service

from langchain_community.llms import Bedrock
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

llm = Bedrock(model_id="amazon.titan-text-express-v1")

prompt = PromptTemplate(
input_variables=["question", "data"],
template="""Answer the following question based on the data provided
question: {question}
data: {data}
""",
)

chain = LLMChain(llm=llm, prompt=prompt)
llm_return_data = chain.run({'question': question, 'data': data})
print (llm_return_data)

24. This should print the list of AWS services mentioned in this tutorial

Congratulations on finishing this tutorial. You just built your own search solution using Retrieval Augmented Generation (RAG)! If you have any follow up questions, please drop them in the comments section.

The opinions expressed in this article are the author’s own and do not reflect the views of AWS.

--

--

Tahir Saeed

Tahir Saeed is a seasoned entrepreneur and cloud evangelist currently working as a Solutions Architect at Amazon Web Services (AWS).