Unlock the Power of Conversational AI: RAG 101 with Gemini & LangChain

Nathaly Alarcón
Google Cloud - Community
5 min readJun 20, 2024

RAG (Retrieval Augmented Generation): A technique to expand the knowledge base of LLMs with your own data, allowing to limit responses mainly to the specified sources. In this occasion, we will create a Question and Answer (Q&A) application using our documents.

Applying RAG involves 3 main phases:

  1. Preparation of additional data: Obtain and prepare the data that will be used to expand the LLM’s knowledge.
  2. Indexing and Retrieval: Create an index of the data and develop a system to retrieve relevant information from that index.
  3. Model Inference Process: Implement the LLM model that will use the retrieved information to generate accurate and coherent answers to users’ questions.

We will use the following tools:

Tools for this tutorial

Let’s get started :)

Step 0. Obtain the following API Keys (both are free).

  • LangChain API Key

Step 1. Libraries and Utilities installation

Install the necessary libraries and import them into the notebook/script you will be using.

!pip install --upgrade -q langchain
!pip install google-generativeai langchain-google-genai
!pip install chromadb pypdf2 python-dotenv
!pip install PyPDF
!pip install -U langchain-community
!pip install sentence-transformers
!pip install langchainhub
# Generic Libraries
from google.colab import userdata
import os
from IPython.display import Markdown
# Data Preparation libraries
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import SentenceTransformerEmbeddings
# Retrieval libraries
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_google_genai import ChatGoogleGenerativeAI

We will also configure the API Keys. In this case, we will retrieve the credentials from the Google Colab Secrets section. As an alternative step, you can copy the API Keys directly into the code.

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = userdata.get('SMITH_APIKEY')
GOOGLE_API_KEY = userdata.get('GoogleAIStudio')

Step 2. Initial Folder Setup

We will create two folders:

  • MyData: This will store the additional files that we will use to expand the model’s knowledge base.
  • VectorDB: This folder will store the Vector database.
!mkdir /content/MyData
!mkdir /content/VectorDB

Load the PDFs you want to use to customize the generated responses into the MyData folder.

Colab Files section. Upload your PDF (s) into “MyData” folder

Step 3. Data Preparation

We will read the PDF files from the MyData folder and convert them into embeddings.

source_data_folder = "/content/MyData"
# Read PDFs from the configured folder
loader = PyPDFDirectoryLoader(source_data_folder)
data_on_pdf = loader.load()
# Size of the data / documents loaded
len(data_on_pdf)
# Partitioning the data. With a limited size (chunks) 
# and 200 characters of overlapping to preserve the context
text_splitter = RecursiveCharacterTextSplitter(
separators=["\n\n", "\n", ". ", " ", ""],
chunk_size=1000,
chunk_overlap=200
)
splits = text_splitter.split_documents(data_on_pdf)
# Number of Chunks generated
len(splits)

Now we will generate the embeddings and store them in Chroma DB.

# For the creation of the embeddings we will use Hugging Face
# https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# You can use any other model
embeddings_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
# Database folder path
path_db = "/content/VectorDB" # @param {type:"string"}
# Store the chunks in the DataBase
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings_model, persist_directory=path_db)

Step 4: Retrieval Setup

The retrieval component is responsible for finding relevant information from the knowledge base (Chroma DB) based on the user’s query.

retriever = vectorstore.as_retriever()

We will use Gemini as the LLM

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", google_api_key=GOOGLE_API_KEY)

We also are going to use one of the basics templates of LangChain for RAG.

RAG Prompt template from LangChain
# https://smith.langchain.com/hub/rlm/rag-prompt
prompt = hub.pull("rlm/rag-prompt")

Finally, we define the rag_chain variable in which we ask LangChain to receive the question, search for the answer in Chroma DB (using the retriever), and ask Gemini to generate the answer in natural language.

def format_docs(docs):
# Format the documents for the prompt.
return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)

Step 5. RAG Execution — Q&A

Since the document I uploaded to the MyData folder is a Bolivian food recipe, I will ask a question about it.

# @title Questions to the document
question = "How to prepare a Silpancho? " # @param {type:"string"}
response = rag_chain.invoke(question )
Markdown(response)

As you can see, the syntax for invoking the model is very simple: rag_chain.invoke(question)

Q & A in Colab with LangChain and Gemini

You can also refine the responses by specifying formats, for example:

Requesting a recipe ingredients in table format

And that’s it! You can now chat with your own documents in a very simple way.

From LangSmith , you can also see the complete reasoning process every time you ask a question using LangChain:

You can find the complete code at:

--

--

Nathaly Alarcón
Google Cloud - Community

I code in my sleep - ♡ I love Coffee ♡ - Data Scientist — Google Developer Expert in Machine Learning - Google Cloud Champion Innovator