Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 1

Yanampally Abhiram Reddy
17 min readJul 14, 2024

--

Retrieval Augmentation Generation Pipeline Image

This article will explore the basics of Retrieval-Augmented Generation (RAG) and LangChain, two powerful tools for creating intelligent chatbots. Our focus will be on building a Cricket Chatbot capable of answering questions and providing information about cricket matches, players, and statistics from the data available on Wikipedia.

We’ll start with a brief overview of RAG, explaining how it combines retrieval-based methods with generative models to provide accurate and contextually relevant responses. Then, we’ll introduce LangChain, a framework designed to simplify the development of such advanced language models.

Through a step-by-step guide, we’ll demonstrate how to integrate these technologies to create a functional and responsive Cricket Chatbot. By the end of this article, you’ll clearly understand RAG and LangChain and how to leverage them to build your own chatbot applications. Whether you’re a beginner or have some experience with AI, this guide will provide you with the foundational knowledge and practical skills needed to get started.

RAG systems work by retrieving relevant information from a vast knowledge base of resources tailored for the required purpose and using that information to generate more accurate and contextually appropriate responses. This approach addresses some limitations of traditional generative models, which can sometimes produce plausible-sounding responses but are factually incorrect.

Eager to delve deeper, I embarked on a journey to understand how RAG works and how it can be applied in real-world scenarios. My interest led me to LangChain, a powerful framework designed to simplify the development of advanced language models. LangChain provides tools, components, and a huge integration ecosystem that make building and deploying RAG-based systems easier, opening up new possibilities for creating intelligent and responsive applications.

Here is a brief flow of what we would like to implement. First, we want to load documents from Wikipedia using WikipediaLoader to fetch cricket-related content. These documents would be then processed into smaller chunks with appropriate headers, ensuring that each chunk is manageable and contains relevant metadata. For each chunk, a Document object will be created, encapsulating the content and its metadata. Subsequently, we initialize the CohereEmbeddings model to convert these chunks into vector embeddings. These document objects must be vectorized and stored in the Chroma vector store, allowing efficient storage and retrieval based on similarity searches.

Next, we prompt the user to input a question. This user input, along with the current memory, is passed through a Runnable chain. The chain first initializes passthrough for input and memory, then generates an embedding query from the user input. Using this query, it retrieves relevant context and constructs a prompt that includes the system message, user input, retrieved context, and previous memory. This prompt is then used to invoke the language model, generating a response. The memory is updated with the user’s question and the generated response, ensuring conversational context is maintained. Finally, the generated response is printed to the user, completing the interaction.


Load Wikipedia Documents
|
v
Process Documents into Chunks
|
v
Create Document Objects
|
v
Initialize Embeddings Model
|
v
Store in Chroma Vectorstore
|
v
Get User Input
|
v
Invoke Runnable Chain
| |
| v
| Initialize input and memory passthrough
| |
| v
| Generate embedding query
| |
| v
| Retrieve context
| |
| v
| Construct prompt
| |
| v
| Invoke LLM for response
| |
| v
| Update memory
|
v
Print Response

Let’s build 🔥🔥….

Prerequisites

Follow along the code in a Jupyter Notebook or Google Colab Notebook environment.

Make sure you have the required packages installed. Run the following command to install LangChain and NVIDIA AI endpoints:

%pip install -q langchain langchain-nvidia-ai-endpoints

Obtaining a Nvidia NIM API key

We will use NVIDIA NIM Inference APIs to access deployed machine learning models. By signing up with a personal email ID, you receive 1,000 credits, which should be sufficient for development purposes.

Nvidia NIM
Nvidia NIM Homepage

Setting Up the Environment

First, import the necessary libraries and set up the console for pretty printing using rich.

from functools import partial
from rich.console import Console
from rich.style import Style
from rich.theme import Theme

console = Console()
base_style = Style(color="#76B900", bold=True)
pprint = partial(console.print, style=base_style)

Configuring the NVIDIA API Key

To use the ChatNVIDIA model, you need to configure your NVIDIA API key. Ensure you have set up an environment variable for your API key.

from langchain_nvidia_ai_endpoints import ChatNVIDIA
import os

api_key = os.getenv("NVIDIA_API_KEY")
os.environ["NVIDIA_API_KEY"] = api_key

Utility Methods for Printing Intermediate States

Create utility methods for printing intermediate states, which will help you debug and understand the processing steps.

from langchain_core.runnables import RunnableLambda
from functools import partial

def RPrint(preface="State: "):
def print_and_return(x, preface=""):
print(f"{preface}{x}")
return x
return RunnableLambda(partial(print_and_return, preface=preface))

def PPrint(preface="State: "):
def print_and_return(x, preface=""):
pprint(preface, x)
return x
return RunnableLambda(partial(print_and_return, preface=preface))

Loading Documents from Wikipedia

Install the Wikipedia package and load documents related to cricket using the WikipediaLoader. Depending on the number of documents you would like to load, this may take a while. Start with fewer documents when the code is error-free, then increment the number of documents.

%pip install --upgrade --quiet Wikipedia
from langchain_community.document_loaders import WikipediaLoader
from datetime import datetime, timedelta
import os
import json

# Load documents from Wikipedia
docs = WikipediaLoader(query="Cricket and everything related to cricket",
load_max_docs=1000).load()

Splitting Text into Chunks

Define a function to split the text into manageable chunks, including metadata like title, summary, and source URL. This step will enable us to cherry-pick only the most contextually relevant parts of the data when providing the LLM with the context for answering a user query instead of the whole page contents of a Wikipedia page. The chunk_overlap defines the number of tokens repeated from the previous chunk in the present chunk of data. This is required to maintain contextual continuity for the LLM to understand.

# Function to split text into chunks with headers
def create_chunks_with_headers(doc, doc_index):
chunk_size = 800
chunk_overlap = 100
chunks = []
start = 0
doc_content = doc.page_content
doc_length = len(doc.page_content)

while start < doc_length:
end = min(start + chunk_size, doc_length)
chunk = doc_content[start:end]

if start != 0:
chunk = doc_content[max(start - chunk_overlap, 0):end]

chunk_json = {
"meta_data": {
"title": doc.metadata["title"],
"summary": doc.metadata['summary'],
"source_url": doc.metadata['source'],
},
"chunk_index": len(chunks) + 1,
"content": chunk
}
chunks.append(chunk_json)

start += chunk_size

return chunks

Processing and Storing Document Chunks

Create an array to store all document chunks as JSON objects. Process each document and split it into chunks.

# Create an array to store all document chunks as JSON objects
all_chunks = []

# Create JSON objects for each document with chunks
for i, doc in enumerate(docs):
chunks = create_chunks_with_headers(doc, i + 1)
all_chunks.extend(chunks)
print(f"Data for document {i + 1} has been processed.")

# If you want to write the array to a file:
# with open('wikipedia_docs_chunks.json', 'w', encoding='utf-8') as file:
# json.dump(all_chunks, file, ensure_ascii=False, indent=4)

print("All data has been processed.")

Setting Up the Self-Query Retriever

First, import the required libraries and initialize the embedding model using Cohere API. Cohere offers trial API keys for development purposes with rate-limited inference services, which will serve the purpose well enough in this case. Here is a resource for more information on embeddings and vector databases.

A self-querying retriever can generate and execute its own queries. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query. This structured query is then applied to its underlying VectorStore. This allows the retriever not only to use the user-input query for semantic similarity comparison with the contents of stored documents but also to extract filters from the user query based on the metadata of stored documents and to execute those filters.

LangChain Self-querying Page
Source: LangChain Self-querying Page
%pip install --upgrade --quiet lark langchain-chroma
%pip install --upgrade --quiet langchain-cohere
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_cohere import CohereEmbeddings

# Initialize the Cohere embeddings model with your API key
embeddings_model = CohereEmbeddings(
cohere_api_key="your-cohere-api-key")

# Initialize an empty list to store Document objects
documents = []

# Iterate over each chunk in all_chunks and create a Document object for each chunk
for chunk in all_chunks:
# Create a Document object with the content and metadata of each chunk
doc = Document(
page_content=chunk["content"],
metadata=chunk["meta_data"],
)
documents.append(doc)

# Print the metadata of the first document to verify
print(documents[0].metadata)

# Initialize the Chroma vector store with the list of documents and the embeddings model
vectorstore = Chroma.from_documents(documents, embeddings_model)

Chroma is an efficient vector store for managing and retrieving document embeddings. A vector store takes care of storing embedded data and performing vector search for you. It allows for easy handling of large sets of vectorized documents and quick similarity searches. We initialize Chroma with our document objects and embeddings model. This setup ensures our self-querying retriever can efficiently access and retrieve relevant cricket-related information, providing accurate and contextually appropriate responses to user queries.

Creating the Self-Querying Retriever

Next, we'll instantiate our retriever. We'll provide metadata information and a description of the document's contents.

In the code, we define metadata fields such as “title,” “summary,” and “source_url” using AttributeInfo. We also describe the document content as "Data about cricket." The SelfQueryRetriever is instantiated using a language model (from ChatNVIDIA), the vector store, the document content description, and the metadata field information. This setup enables the retriever to handle complex queries effectively by leveraging the documents' content and metadata.

from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever

metadata_field_info = [
AttributeInfo(
name="title",
description="The name of the article",
type="string",
),
AttributeInfo(
name="summary",
description="The short summary of the article contents",
type="integer",
),
AttributeInfo(
name="source_url",
description="The web URI link to the article webpage",
type="string",
),
]
document_content_description = "Data about cricket"
llm = ChatNVIDIA(model="mistralai/mistral-7b-instruct-v0.2")
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
)

Integrating Chat Memory

Every inference response from an LLM is influenced by its previous responses; hence, we need methods to maintain both long-term and short-term memory to facilitate natural conversations with our chatbot.

LangChain out-of-the-box supports different short-term memory solutions to maintain conversational memory for the chatting agent to keep track of like ConversationBufferMemory , ConversationSummaryMemory , etc. In our case, we will be implementing a memory solution of our own by updating a memory variable wherein the memory queue shall store only the last five conversations, to not exceed the context window of our LLM model.

from collections import deque

# Initialize memory as a deque with a maximum length of 5
memory = deque(maxlen=5)

def update_memory(user_question, response):
memory.append({
"question": user_question,
"response": response,
})
pprint(memory)
# Example memory object

deque([
{
'question': 'Who is Sachin Tendulkar and his age by 2024?',
'response': 'Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the
greatest batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born
on April 24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nTendulkar has an incredible
cricketing career spanning over two decades, with numerous records to his name. He is the highest run-scorer in
both Test and ODI cricket, with 100 international centuries. He has also won numerous awards, including the Bharat
Ratna, India\'s highest civilian award.\n\nWould you like to know more about Sachin Tendulkar\'s cricketing career
or his personal life?'
},
{
'question': 'Tell me more about him',
'response': 'You already know who Sachin Tendulkar is and how old he is as of 2024! As we previously
discussed, Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest
batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born on April
24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nIf you\'re interested, we could explore more
about his cricketing career, his personal life, or even compare him to other legendary cricketers like Sunil
Gavaskar or Virat Kohli.'
}
])

Setting Up the Chat Model

Define the chat model and the system message to guide the chatbot's responses. Feel free to try different prompts and models and see what works best for you.

from langchain_core.runnables import RunnableLambda, RunnableAssign, RunnablePassthrough
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

sys_msg = """
You are an intelligent assistant that answers all questions about cricket using contextual information from Wikipedia. Your responses should be conversational and informative, providing clear and concise explanations. When relevant, include the source URL of the articles to give users additional reading material.

Always aim to:
1. Answer the question directly and clearly.
2. Provide context and background information when useful but do not give irrelevant information and answer to the point.
3. Suggest related topics or additional points of interest.
4. Be polite and engaging in your responses.
5. Remove the unnecessary context from the context provided if irrelevant to the question

Now, let's get started!
"""

# Initialize the chat model
instruct_chat = ChatNVIDIA(model="meta/llama3-70b-instruct")
llm = instruct_chat | StrOutputParser()

Generating Embeddings

Define functions to generate embeddings and create queries from user inputs. A function cannot be directly executed in the chain; hence, they are converted into a Runnable Object using RunnableLambda . For more information about Runnables and their execution in chains, here is a great reference material.

The generate_embeddings function takes a well-formed query generated by an LLM in generate_embeddings_query function based on the user’s input and recent conversation memory. This step ensures a better retrieval of relevant data for the LLM to form the final response to the user’s question.

def generate_embeddings(input_data):
embeddings = retriever.invoke(input_data)
if embeddings:
return embeddings
else:
return "No data available"

def generate_embeddings_query(input_data):
# Defining a prompt template
prompt = ChatPromptTemplate.from_template(
f"""
User's Question: {{input}}
Previous conversation memory {{memory}}
Generate only a query sentence and nothing else from the user's question to fetch from the data from embeddings. If the user's question does not have enough context then create a query based on the Knowledge Base.
"""
)
embedding_chain = prompt | llm
embeddings_query = embedding_chain.invoke(input_data)
if embeddings_query:
return embeddings_query
else:
return "Process failed"

generate_embeddings_runnable = RunnableLambda(generate_embeddings)
generate_embeddings_query_runnable = RunnableLambda(generate_embeddings_query)

Creating the Runnable Chain

Combine the components into a runnable chain that integrates memory and generates responses based on user inputs.

def get_response(prompt):
return llm.invoke(prompt)

# Create the Runnable chain with memory integration
Runnable = (
{"input": RunnablePassthrough(), "memory": RunnablePassthrough()}
| RunnableAssign({"embedding_query": generate_embeddings_query_runnable})
| RunnableAssign({"context": generate_embeddings_runnable})
| RunnableAssign({"prompt": lambda x: ChatPromptTemplate.from_template(
f"""
{sys_msg}

User's Question: {{input}}

Context Information: {{context}}

Previous Conversation memory: {{memory}}

Your Response:
"""
)})
| RunnableAssign({"response": lambda x: get_response(x["prompt"])})
| RunnableAssign({"memory": lambda x: update_memory(x["input"]["input"], x["response"])})
)

# Get user input and invoke the chain
user_input = "Who is Sachin Tendulkar and his age by 2024?"
response = Runnable.invoke({"input": user_input, "memory": memory})

pprint(response["response"])

This block creates a chain of operations (a Runnable chain) that processes user input and integrates memory to maintain conversational context.

Breakdown of the Chain:

Initial Input and Memory Passthrough:

{"input": RunnablePassthrough(), "memory": RunnablePassthrough()}
  • RunnablePassthrough allows the input and memory to pass through without any changes. It sets up the initial state for further processing.

Hence, the present state of the object at this stage would be like —

{
"input": "Tell me more about his batting records",
"memory":deque([
{
'question': 'Who is Sachin Tendulkar and his age by 2024?',
'response': 'Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the
greatest batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born
on April 24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nTendulkar has an incredible
cricketing career spanning over two decades, with numerous records to his name. He is the highest run-scorer in
both Test and ODI cricket, with 100 international centuries. He has also won numerous awards, including the Bharat
Ratna, India\'s highest civilian award.\n\nWould you like to know more about Sachin Tendulkar\'s cricketing career
or his personal life?'
},
{
'question': 'Tell me more about him',
'response': 'You already know who Sachin Tendulkar is and how old he is as of 2024! As we previously
discussed, Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest
batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born on April
24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nIf you\'re interested, we could explore more
about his cricketing career, his personal life, or even compare him to other legendary cricketers like Sunil
Gavaskar or Virat Kohli.'
}
])
}

Generate Embedding Query:

| RunnableAssign({"embedding_query": generate_embeddings_query_runnable})
  • This step generates an embedding query based on the user’s input. It uses the generate_embeddings_query_runnable function to create a structured query for retrieving relevant embeddings.
{
"input": "Tell me more about his batting records",
"memory":deque([
{
'question': 'Who is Sachin Tendulkar and his age by 2024?',
'response': 'Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the
greatest batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born
on April 24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nTendulkar has an incredible
cricketing career spanning over two decades, with numerous records to his name. He is the highest run-scorer in
both Test and ODI cricket, with 100 international centuries. He has also won numerous awards, including the Bharat
Ratna, India\'s highest civilian award.\n\nWould you like to know more about Sachin Tendulkar\'s cricketing career
or his personal life?'
},
{
'question': 'Tell me more about him',
'response': 'You already know who Sachin Tendulkar is and how old he is as of 2024! As we previously
discussed, Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest
batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born on April
24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nIf you\'re interested, we could explore more
about his cricketing career, his personal life, or even compare him to other legendary cricketers like Sunil
Gavaskar or Virat Kohli.'
}
]),
"embedding_query":"........",
}

Retrieve Context:

| RunnableAssign({"context": generate_embeddings_runnable})
  • This step retrieves the context using the generated embedding query. The generate_embeddings_runnable function fetches the relevant information based on the query.
{
"input": "Tell me more about his batting records",
"memory":deque([
{
'question': 'Who is Sachin Tendulkar and his age by 2024?',
'response': 'Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the
greatest batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born
on April 24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nTendulkar has an incredible
cricketing career spanning over two decades, with numerous records to his name. He is the highest run-scorer in
both Test and ODI cricket, with 100 international centuries. He has also won numerous awards, including the Bharat
Ratna, India\'s highest civilian award.\n\nWould you like to know more about Sachin Tendulkar\'s cricketing career
or his personal life?'
},
{
'question': 'Tell me more about him',
'response': 'You already know who Sachin Tendulkar is and how old he is as of 2024! As we previously
discussed, Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest
batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born on April
24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nIf you\'re interested, we could explore more
about his cricketing career, his personal life, or even compare him to other legendary cricketers like Sunil
Gavaskar or Virat Kohli.'
}
]),
"embedding_query":"........",
"context":Document([{...}])
}

Create Prompt Template:

| RunnableAssign({"prompt": lambda x: ChatPromptTemplate.from_template(
f"""
{sys_msg}

User's Question: {{input}}

Context Information: {{context}}

Previous Conversation memory: {{memory}}

Your Response:
"""
)})
  • This step constructs a prompt template using the system message (sys_msg), user input, retrieved-context, and previous conversation memory. It creates a prompt that the language model will use to generate a response.

Get Response from LLM:

| RunnableAssign({"response": lambda x: get_response(x["prompt"])})
  • This step invokes the language model with the constructed prompt to generate a response. The get_response function is used here.
{
"input": "Tell me more about his batting records",
"memory":deque([
{
'question': 'Who is Sachin Tendulkar and his age by 2024?',
'response': 'Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the
greatest batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born
on April 24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nTendulkar has an incredible
cricketing career spanning over two decades, with numerous records to his name. He is the highest run-scorer in
both Test and ODI cricket, with 100 international centuries. He has also won numerous awards, including the Bharat
Ratna, India\'s highest civilian award.\n\nWould you like to know more about Sachin Tendulkar\'s cricketing career
or his personal life?'
},
{
'question': 'Tell me more about him',
'response': 'You already know who Sachin Tendulkar is and how old he is as of 2024! As we previously
discussed, Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest
batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born on April
24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nIf you\'re interested, we could explore more
about his cricketing career, his personal life, or even compare him to other legendary cricketers like Sunil
Gavaskar or Virat Kohli.'
}
]),
"embedding_query":"........",
"context":Document([{...}]),
"prompt":"...",
"response":"Hailed as the world's most prolific batsman of all time, Sachin is the all-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs, respectively.He also holds the record for receiving the most player of the match awards in international cricket."
}

Update Memory:

| RunnableAssign({"memory": lambda x: update_memory(x["input"]["input"], x["response"])})
  • This step updates the memory with the user’s question and the generated response. The update_memory function stores the latest interaction to maintain conversational context.
{
"input": "Tell me more about his batting records",
"memory":deque([
{
'question': 'Who is Sachin Tendulkar and his age by 2024?',
'response': 'Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the
greatest batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born
on April 24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nTendulkar has an incredible
cricketing career spanning over two decades, with numerous records to his name. He is the highest run-scorer in
both Test and ODI cricket, with 100 international centuries. He has also won numerous awards, including the Bharat
Ratna, India\'s highest civilian award.\n\nWould you like to know more about Sachin Tendulkar\'s cricketing career
or his personal life?'
},
{
'question': 'Tell me more about him',
'response': 'You already know who Sachin Tendulkar is and how old he is as of 2024! As we previously
discussed, Sachin Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest
batsmen of all time. He is often referred to as the "God of Cricket" in India and among cricket fans. Born on April
24, 1973, Sachin Tendulkar is currently 51 years old (as of 2024).\n\nIf you\'re interested, we could explore more
about his cricketing career, his personal life, or even compare him to other legendary cricketers like Sunil
Gavaskar or Virat Kohli.'
},
{
'question': 'Tell me more about his batting records',
'response': 'Hailed as the world's most prolific batsman of all time, Sachin is the all-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs, respectively.He also holds the record for receiving the most player
of the match awards in international cricket.'
}
]),
"embedding_query":"........",
"context":Document([{...}]),
"prompt":"...",
"response":"Hailed as the world's most prolific batsman of all time, Sachin is the all-time highest run-scorer in both ODI and Test cricket with more than 18,000 runs and 15,000 runs, respectively.He also holds the record for receiving the most player of the match awards in international cricket."
}

Invoking the Runnable Chain

user_input = "Who is Sachin Tendulkar and his age by 2024?"
response = Runnable.invoke({"input": user_input, "memory": memory})
pprint(response["response"])

Brief Summary of the pipeline

This code sets up a process to handle user queries with a chatbot, integrating memory to maintain conversational context. It creates a chain of operations that:

  1. Takes user input and current memory.
  2. Generates an embedding query.
  3. Retrieves relevant context.
  4. Constructs a prompt.
  5. Invokes the language model to generate a response.
  6. Updates the memory with the latest interaction.

This setup ensures that each response considers previous interactions, allowing for coherent and contextually appropriate conversations.

Github Link for the source code

Conclusion

This article covered the implementation of a cricket chatbot in an .ipynb environment such as a Jupyter Notebook or Google Colab. In the next article, we will convert the code into a .py file using object-oriented programming principles, incorporating classes and methods. We will also set up a FastAPI server to handle requests from a ReactJS frontend UI.

Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 2

Here’s a question: The code in this article is heavily synchronous and blocking. How can you improve it to minimize response times and enhance performance? Think about it :)

Until then —

This is Mr.E, signing off. May your queries always be SELECT * FROM Success!

References

--

--

Yanampally Abhiram Reddy

Hey there! I'm Abhiram, a passionate Machine Learning Engineer, a Technology Enthusiast and an ardent fan of Cricket.