Quick Start on RAG (Retrieval-Augmented Generation) for Q&A using AWS Bedrock, ChromaDB, and LangChain

4 min readJan 28, 2024

The RAG technique, or Retrieval Augmented Generation, is an advanced approach to questions and answers that combines elements of information retrieval and natural language generation. It stands out for its efficient use of embeddings and Vector DB to provide more contextual and informative responses.

Benefits of the RAG Technique:

Advanced Contextualization: The use of embeddings allows the system to understand the semantic context of words, improving the quality of generated responses.
Efficient Retrieval: The Vector DB optimizes the retrieval of relevant information, contributing to more accurate and contextualized answers.
Integration of Language Models: The RAG technique integrates natural language generation with information retrieval, delivering more informative and relevant responses.

In this post, we will explore step by step how to connect to AWS Bedrock, ChromaDB to create a VectorDB, and finally, implement a Q&A retrieval chain using the LangChain library.

1. Configuring Environment Variables

To ensure seamless integration with AWS services, it’s crucial to set up the necessary environment variables.

First and foremost, we need to obtain the environment variables that identify the AWS user to access the required services, in our case, Bedrock via boto3.

import os
AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID')
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY')
REGION_NAME = os.getenv('AWS_REGION')

To configure these variables, you can use the export command:

export AWS_DEFAULT_REGION=us-east-1

export AWS_ACCESS_KEY_ID=<YOUR_ACCESS_KEY>

export AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY>

2. Tools Importing

In this section, we’re importing essential tools and modules to make the entire process smooth. These include text processing, document loading, AWS Bedrock interaction, and more. This ensures that the subsequent steps are well-supported with the required functionalities.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import PromptTemplate
from langchain.embeddings.bedrock import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
import boto3

3. AWS Bedrock Client Connection with boto3

For the most of cases, we’ll use boto3 library to set up a connection with AWS services and performe its functionalities using Python. This library simplifies the process of interacting with AWS services. The script bellow establishes a connection to the 'bedrock-runtime' service.

bedrock = boto3.client(service_name='bedrock-runtime', 
                       region_name=AWS_DEFAULT_REGION,
                       aws_access_key_id=AWS_ACCESS_KEY_ID,
                       aws_secret_access_key=AWS_SECRET_ACCESS_KEY)

4. PDF Files Loading

To illustrate the script’s functionality, let’s say we have some PDF files (one, in this case, but can be more than one) related to document analysis. The script efficiently loads these files using the PyPDFLoader, preparing them for further processing.

In this tutorial, we could use the following file as an example to formulate questions about it: https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf.

loaders = [PyPDFLoader("bedrock-ug.pdf"),]

5. VectorDB Creation

This step involves creating a Vector Database (VectorDB) from the loaded documents using ChromaDB library. It intelligently splits the documents and generates embeddings, providing a structured and organized representation of the content. The resulting Vectordb is then stored in a specified directory.

docs = []

for loader in loaders:
    docs.extend(loader.load())
    
r_splitter = RecursiveCharacterTextSplitter(chunk_size=10000,
                                            chunk_overlap=100,
                                            separators=["\n\n", "\n"])

docs_splitted = r_splitter.split_documents(docs)

vector_store = Chroma.from_documents(documents=docs_splitted,
                                     embedding=BedrockEmbeddings(),
                                     persist_directory='vector_store/chroma/')

6. LangChain Q&A Retrieval Chain

Now, imagine you have questions related to the content of these documents. The script bellow sets up a powerful question-answering (QA) retrieval chain using LangChain. This chain uses the Bedrock language model (Claude 2) to answer your queries. The retrieval mechanism is enhanced with a pre-built vector store, making the entire QA process efficient.

QUERY_PROMPT_TEMPLATE = """\
H:
Answer the question based on the provided context. Do not create false information.
{context}
Question: {question}
A:
"""

qa_chain = RetrievalQA.from_chain_type(
        llm=Bedrock(model_id='anthropic.claude-v2', client=bedrock),
        retriever=vector_store.as_retriever(search_kwargs={'k': 5}),
        return_source_documents=True,
        chain_type_kwargs={"prompt": PromptTemplate.from_template(QUERY_PROMPT_TEMPLATE)}
    )

7. Getting the results

Now, we can make some query related to the document passed and get the answer based on the retrieved documents chunks by the vector store retriever.

Question:

question = 'How to use bedrock via boto3?'

Query:

response = qa_chain({"query": question})

response['result']

Answer:

Here is an example of how to use boto3 to call Amazon Bedrock models:

```python
import boto3
import json

bedrock = boto3.client('bedrock')

body = {
  "prompt": "Hello, how are you today?",
  "modelId": "meta.llama2-13b-chat-v1",
  "parameters": {
    "temperature": 0.5
  }  
}

response = bedrock.invoke_model(
  accept='application/json',
  contentType='application/json',
  body=json.dumps(body)
)

print(json.loads(response['body'].read()))
```

The key steps are:

- Create a Bedrock client using `boto3.client('bedrock')`
- Construct the request body with the prompt, model ID, and any inference parameters
- Call `invoke_model()`, passing the body and setting content type and accept headers
- Parse the response body 

The response will contain the model output. You can customize the request by using different models, prompts, and parameters.

The Bedrock client also provides other methods like `list_models`.

This post provides a comprehensive overview of the RAG technique for questions and answers, combining information retrieval and natural language generation. The benefits of RAG include advanced contextualization, efficient retrieval using Vector DB, and the integration of language models for more informative responses.

I hope the insights shared here prove valuable in enhancing your work, regardless of your specific area of focus.

Quick Start on RAG (Retrieval-Augmented Generation) for Q&A using AWS Bedrock, ChromaDB, and LangChain

👏🏽 I hope you enjoy the content! | Follow me on LinkedIn.

Written by thallyscostalat