Building RAG Pattern Using Bedrock, LangChain and MongoDB Atlas

Sojeong Baek
3 min readMay 20, 2024

--

There are diverse database options to use as vector stores for Bedrock, and MongoDB Atlas is one of them. The MongoDB Atlas Vector Search Integration with Amazon Bedrock was officially announced recently.

As you might be already familiar, MongoDB is a document database, and with its Vector Search feature, you can search vector data.

Following the last article introducing Pinecone as a vector store, here I share how to build a RAG pattern using Bedrock, LangChain and MongoDB Atlas.

1. Creating MongoDB Atlas Vector Search

As the first step, I created a MongoDB Atlas cluster on AWS and an Atlas Vector Search index.

{
"fields": [
{
"numDimensions": 1536,
"path": "vec_content",
"similarity": "cosine",
"type": "vector"
}
]
}

Move to the Database page and Atlas Search tab on MongoDB console. I created an Atlas Vector Search index and defined it as above. “vec_content” is the field in the collection containing the vector data. I also specified the dimension and similarity metric.

2. Preparing data

##pymongo_get_database module#

!pip install "pymongo[srv]"

import os
from pymongo import MongoClient

def get_database():
CONNECTION_STRING = f"mongodb+srv://sojeong:{os.environ['mdb_password']}@cluster0.w8cee3u.mongodb.net/"

client = MongoClient(CONNECTION_STRING)

return client['articles']

if __name__ == "__main__":
dbname = get_database()

In order to insert vector data into MongoDB, I created a connection to MongoDB using pymongo. The code above is the module to create the MongoDB connection.

!pip install --upgrade --quiet  boto3
!pip install langchain-community langchain-core
!pip install --upgrade langchain-aws
!pip install sentence_transformers

import os
import pandas as pd
from pymongo_get_database import get_database
from sentence_transformers import SentenceTransformer
from langchain_community.vectorstores import MongoDBAtlasVectorSearch
from langchain_community.llms import Bedrock
from langchain_community.embeddings import BedrockEmbeddings
from langchain.chains import RetrievalQA

os.environ['mdb_password'] = 'password'
os.environ['AWS_ACCESS_KEY_ID'] = 'AWS_ACCESS_KEY_ID'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'AWS_SECRET_ACCESS_KEY'

I also imported the required libraries.

dbname = get_database()
collection_name = dbname["usa"]

df = pd.read_csv('article.csv')
df = df.iloc[0:50]

embeddings_model = BedrockEmbeddings(region_name="us-east-1", model_id="amazon.titan-embed-text-v1")

df['content'] = df['content'].astype(str)
df['vec_content'] = df['content'].map(lambda x: embeddings_model.embed_query(x))

Firstly, I connected to MongoDB using the get_database function from the pymongo_get_database module that I created. Then I created a data frame and converted the data into vectors using Amazon Titan embedding model.

Here is what the data looked like.

row_cnt = df.shape[0]
for i in range(row_cnt):
temp_dic = {}
temp_dic['_id'] = df.iloc[i][0].astype(str)
temp_dic['content'] = df.iloc[i][1]
temp_dic['vec_content'] = df.iloc[i][2]
collection_name.insert_one(temp_dic)

Lastly, I inserted the data into MongoDB.

3. Invoking QA agent with vector store

llm = Bedrock(
model_id="amazon.titan-text-express-v1"
)

embeddings = BedrockEmbeddings(region_name="us-east-1")

vector_search = MongoDBAtlasVectorSearch.from_connection_string(
f"mongodb+srv://sojeong:{os.environ['mdb_password']}@cluster0.w8cee3u.mongodb.net/",
namespace = "articles.usa",
embedding=embeddings
)

I initialized LLM with Bedrock and configured a vector search in MongoDB Atlas using the generated embeddings to search within a specific collection.

qa = RetrievalQA.from_chain_type(
llm = llm,
chain_type = "stuff",
#retrieve data from vector store
retriever = vector_search.as_retriever())

qa.invoke("What is the reason that SpaceX project failed in September?")

Last but not the least, I created retrieval QA agent and invoked it.

Here is the retrieved answer from the QA agent and I can see that the LLM has referenced the data from the vector store.

--

--

Sojeong Baek

A junior solutions architect loves tech and business.