Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 2
The first article on the topic outlines the process of creating an intelligent Cricket Chatbot using Retrieval-Augmented Generation (RAG) and LangChain in a Jupyter Notebook environment. It starts by explaining RAG’s approach of combining retrieval and generative models to enhance response accuracy. LangChain is introduced as a framework that simplifies developing such models. The guide details steps including loading and processing cricket-related Wikipedia data, creating document chunks, and utilizing vector embeddings. It further covers setting up a self-querying retriever, integrating chat memory, and configuring the chatbot’s response generation. By following these steps, users can build a functional chatbot capable of answering cricket-related queries using Wikipedia data.
In this tutorial, we will build a cricket chatbot using various tools from the LangChain ecosystem and FastAPI. The chatbot will be capable of answering cricket-related questions by retrieving relevant information from Wikipedia and generating responses using the NVIDIA NIM Inference models.
Project Structure
The project consists of two main Python files:
cricket_bot_data.py
: Handles data processing, embedding generation, and chatbot logic.cricket_bot_api.py
: Implements the FastAPI server to provide an API for the chatbot.
cricket_bot_data.py
This script accomplishes a series of technical tasks: it initiates by retrieving cricket-related documents from Wikipedia, then processes these documents by dividing them into smaller, manageable chunks. It subsequently generates embeddings for each of these chunks using Cohere’s embedding model and stores them in a Chroma vector store.
Here’s a step-by-step breakdown of the code:
Import Required Libraries
import os
from tqdm import tqdm
from rich.style import Style
from collections import deque
from functools import partial
from dotenv import load_dotenv
from rich.console import Console
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_cohere import CohereEmbeddings
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import WikipediaLoader
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_core.runnables import RunnableLambda, RunnableAssign, RunnablePassthrough
Load Environment Variables
To use the ChatNVIDIA model, you need to configure your NVIDIA API key. Ensure you have set up a .env
file for your API keys.
load_dotenv()
os.environ["NVIDIA_API_KEY"] = os.getenv('NVIDIA_API_KEY')
os.environ["COHERE_API_KEY"]=os.getenv('COHERE_API_KEY')
Initialize Console for Pretty Printing
First, import the necessary libraries and set up the console for pretty printing using rich
.
console = Console()
base_style = Style(color="#76B900", bold=True)
pprint = partial(console.print, style=base_style)
def PPrint(preface="State: "):
def print_and_return(x, preface=""):
pprint(preface, x)
return x
return RunnableLambda(partial(print_and_return, preface=preface))
Wikipedia Document Processing
Define the WikipediaDocumentProcessor Class
class WikipediaDocumentProcessor:
def __init__(self, query, load_max_docs=10, chunk_size=800, chunk_overlap=100):
self.query = query
self.load_max_docs = load_max_docs
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
self.docs = []
self.all_chunks = []
self.embeddings_model = CohereEmbeddings(
cohere_api_key=os.getenv('COHERE_API_KEY'))
self.vectorstore = None
def load_documents(self):
self.docs = list(tqdm(WikipediaLoader(
query=self.query, load_max_docs=self.load_max_docs).load(), desc="Loading docs"))
print(f"{len(self.docs)} documents loaded from Wikipedia.")
def create_chunks_with_headers(self, doc, doc_index):
chunks = []
start = 0
doc_content = doc.page_content
doc_length = len(doc.page_content)
while start < doc_length:
end = min(start + self.chunk_size, doc_length)
chunk = doc_content[start:end]
if start != 0:
chunk = doc_content[max(start - self.chunk_overlap, 0):end]
chunk_json = {
"meta_data": {
"title": doc.metadata["title"],
"summary": doc.metadata['summary'],
"source_url": doc.metadata['source'],
},
"chunk_index": len(chunks) + 1,
"doc_index": doc_index,
"content": chunk
}
chunks.append(chunk_json)
start += self.chunk_size
return chunks
def process_and_create_embeddings(self):
self.all_chunks = []
for i, doc in enumerate(self.docs):
chunks = self.create_chunks_with_headers(doc, i + 1)
self.all_chunks.extend(chunks)
documents = [Document(page_content=chunk["content"],
metadata=chunk["meta_data"]) for chunk in self.all_chunks]
self.vectorstore = Chroma.from_documents(
documents, self.embeddings_model)
print("All data has been processed and stored in Chroma.")
def run(self):
self.load_documents()
self.process_and_create_embeddings()
print("All data has been processed.")
The WikipediaDocumentProcessor
class automates document handling in an AI system by fetching relevant Wikipedia articles with the load_documents()
method, chunking these documents with metadata using create_chunks_with_headers()
, and converting them into embeddings with process_and_create_embeddings()
. The embeddings are stored in a Chroma
vector store for efficient retrieval. The run()
method orchestrates this entire process, ensuring sequential execution and status updates, effectively preparing data for AI models or other applications.
Here is a more elaborate breakdown of the WikipediaDocumentProcessor
class :
__init__
Method
def __init__(self, query, load_max_docs=10, chunk_size=800, chunk_overlap=100):
self.query = query
self.load_max_docs = load_max_docs
self.chunk_size = chunk_size
self.chunk_overlap = chunk_overlap
self.docs = []
self.all_chunks = []
self.embeddings_model = CohereEmbeddings(
cohere_api_key=os.getenv('COHERE_API_KEY'))
self.vectorstore = None
Purpose: Initializes the instance of WikipediaDocumentProcessor
.
Parameters:
query
(str): The search query for Wikipedia articles.load_max_docs
(int, default=1000): Maximum number of documents to load. Start out with a smaller number of docs first, if the code is stable then increase the number because loading these many docs takes few tens of minutes.chunk_size
(int, default=800): The size of each document chunk.chunk_overlap
(int, default=100): The overlap between consecutive chunks.
Functionality:
- Sets instance variables with the provided parameters.
- Initializes empty lists for storing documents and chunks.
- Creates an instance of
CohereEmbeddings
for generating document embeddings. - Initializes
vectorstore
asNone
.
load_documents
def load_documents(self):
self.docs = list(tqdm(WikipediaLoader(
query=self.query, load_max_docs=self.load_max_docs).load(), desc="Loading docs"))
print(f"{len(self.docs)} documents loaded from Wikipedia.")
Purpose: Loads documents from Wikipedia based on the query.
Functionality:
- Uses
WikipediaLoader
to load documents matching the query. - Limits the number of documents to
load_max_docs
. - Displays a progress bar using
tqdm
while loading. - Prints the number of documents loaded.
create_chunks_with_headers
def create_chunks_with_headers(self, doc, doc_index):
chunks = []
start = 0
doc_content = doc.page_content
doc_length = len(doc.page_content)
while start < doc_length:
end = min(start + self.chunk_size, doc_length)
chunk = doc_content[start:end]
if start != 0:
chunk = doc_content[max(start - self.chunk_overlap, 0):end]
chunk_json = {
"meta_data": {
"title": doc.metadata["title"],
"summary": doc.metadata['summary'],
"source_url": doc.metadata['source'],
},
"chunk_index": len(chunks) + 1,
"doc_index": doc_index,
"content": chunk
}
chunks.append(chunk_json)
start += self.chunk_size
return chunks
Purpose: Creates chunks of text from a document with metadata.
Parameters:
doc
(Document): The document to chunk.doc_index
(int): The index of the document.
Functionality:
- Splits the document’s content into chunks of size
chunk_size
with an overlap ofchunk_overlap
. - Adds metadata (title, summary, source URL) to each chunk.
- Returns a list of chunks with their metadata.
process_and_create_embeddings
def process_and_create_embeddings(self):
self.all_chunks = []
for i, doc in enumerate(self.docs):
chunks = self.create_chunks_with_headers(doc, i + 1)
self.all_chunks.extend(chunks)
documents = [Document(page_content=chunk["content"],
metadata=chunk["meta_data"]) for chunk in self.all_chunks]
self.vectorstore = Chroma.from_documents(
documents, self.embeddings_model)
print("All data has been processed and stored in Chroma.")
Purpose: Processes the loaded documents into chunks and creates embeddings for these chunks.
Functionality:
- Iterates through the loaded documents and creates chunks using
create_chunks_with_headers
. - Converts the chunks into
Document
objects with metadata. - Uses
Chroma.from_documents
to create a vector store from theseDocument
objects using theCohereEmbeddings
model. - Prints a message indicating that processing and storage are complete.
run
def run(self):
self.load_documents()
self.process_and_create_embeddings()
print("All data has been processed.")
Purpose: Orchestrates the entire document processing and embedding creation workflow.
Functionality:
- Calls
load_documents
to load the documents from Wikipedia. - Calls
process_and_create_embeddings
to process the documents into chunks and store the embeddings. - Prints a message indicating that all data has been processed.
This breakdown outlines the purpose and functionality of each method within the WikipediaDocumentProcessor
class.
Cricket Assistant Implementation
Now that the WikipediaDocumentProcessor
class has processed and stored relevant information in our datastore, we need to implement a self-query pipeline. This pipeline will enable the LLM to retrieve and utilize the context from the stored document chunks based on the user's input, effectively leveraging the embeddings and metadata generated during document processing. For this we define another class named CricketAssistant
for doing so.
Define the CricketAssistant Class
class CricketAssistant:
def __init__(self, vectorstore):
self.embeddings_model = CohereEmbeddings(
cohere_api_key=os.getenv('COHERE_API_KEY'))
self.vectorstore = vectorstore
self.initialize_retriever()
self.memory = deque(maxlen=5)
self.llm = ChatNVIDIA(
model="mistralai/mistral-7b-instruct-v0.2") | StrOutputParser()
def initialize_retriever(self):
metadata_field_info = [
AttributeInfo(
name="title", description="The name of the article", type="string"),
AttributeInfo(
name="summary", description="The short summary of the article contents", type="integer"),
AttributeInfo(
name="source_url", description="The web uri link to the article webpage", type="string"),
]
document_content_description = "Data about cricket"
llm = ChatNVIDIA(
model="mistralai/mistral-7b-instruct-v0.2") | StrOutputParser()
self.retriever = SelfQueryRetriever.from_llm(
llm, self.vectorstore, document_content_description, metadata_field_info)
def update_memory(self, user_question, response):
self.memory.append({"question": user_question, "response": response})
def generate_embeddings(self, input_data):
embeddings = self.retriever.invoke(input_data)
if embeddings:
return embeddings
else:
return "No data available"
def generate_embeddings_query(self, input_data):
prompt = ChatPromptTemplate.from_template("""
User's Question: {{input}}
Previous conversation memory {{memory}}
Generate only a query sentence and nothing else from the user's question to fetch from the data from embeddings. If the user's question does not have enough context then create a query based on the Knowledge Base.
""")
embedding_chain = prompt | self.llm
embeddings_query = embedding_chain.invoke(input_data)
if embeddings_query:
return embeddings_query
else:
return "Process failed"
def get_response(self, prompt):
return self.llm.invoke(prompt)
def handle_user_input(self, user_input):
sys_msg = """
You are an intelligent assistant that answers all questions about cricket using contextual information from Wikipedia. Your responses should be conversational and informative, providing clear and concise explanations. When relevant, include the source URL of the articles to give users additional reading material.
Always aim to:
1. Answer the question directly and clearly.
2. Provide context and background information when useful but do not give irrelevant information and answer to the point.
3. Suggest related topics or additional points of interest.
4. Be polite and engaging in your responses.
5. Remove the unnecessary context from the context provided if irrelevant to the question
Now, let's get started!
"""
Runnable = (
{"input": RunnablePassthrough(), "memory": RunnablePassthrough()}
| RunnableAssign({"embedding_query": RunnableLambda(self.generate_embeddings_query)})
| RunnableAssign({"context": RunnableLambda(self.generate_embeddings)})
| RunnableAssign({"prompt": lambda x: ChatPromptTemplate.from_template(
f"""
{sys_msg}
User's Question: {{input}}
Context Information: {{context}}
Previous Conversation memory: {{memory}}
Your Response:
"""
)})
| RunnableAssign({"response": lambda x: self.get_response(x["prompt"])})
| RunnableAssign({"memory": lambda x: self.update_memory(x["input"]["input"], x["response"])})
)
response = Runnable.invoke(
{"input": user_input, "memory": self.memory})
return response["response"]
__init__ (self, vectorstore)
def __init__(self, vectorstore):
self.embeddings_model = CohereEmbeddings(
cohere_api_key=os.getenv('COHERE_API_KEY'))
self.vectorstore = vectorstore
self.initialize_retriever()
self.memory = deque(maxlen=5)
def __init__(self, vectorstore):
self.embeddings_model = CohereEmbeddings(
cohere_api_key=os.getenv('COHERE_API_KEY'))
self.vectorstore = vectorstore
self.initialize_retriever()
self.memory = deque(maxlen=5)
self.llm = ChatNVIDIA(
model="mistralai/mixtral-8x22b-instruct-v0.1") | StrOutputParser()
- Embeddings Model: Initializes a
CohereEmbeddings
instance using an API key retrieved from environment variables. This model is likely used for generating embeddings from text data. - Vector Store: Accepts a
vectorstore
object passed during initialization, which is presumably used to store or retrieve embeddings. - Retriever Initialization: Calls
initialize_retriever()
method to set up a retriever component. - Memory Management: Utilizes a
deque
with a maximum length of 5 to keep track of recent interactions, allowing the assistant to maintain context over a series of exchanges. - Large Language Model (LLM): Initializes a
ChatNVIDIA
instance with a specific model and chains it with aStrOutputParser
, indicating that this LLM will be used for processing natural language inputs and outputs.
initialize_retriever
def initialize_retriever(self):
metadata_field_info = [
AttributeInfo(
name="title", description="The name of the article", type="string"),
AttributeInfo(
name="summary", description="The short summary of the article contents", type="integer"),
AttributeInfo(
name="source_url", description="The web uri link to the article webpage", type="string"),
]
document_content_description = "Data about cricket"
llm = ChatNVIDIA(
model="mistralai/mistral-7b-instruct-v0.2") | StrOutputParser()
self.retriever = SelfQueryRetriever.from_llm(
llm, self.vectorstore, document_content_description, metadata_field_info)
Configures a retriever component for querying relevant information based on embeddings.
- Defines metadata field information for documents, including title, summary, and source URL.
- Initializes a
SelfQueryRetriever
with the LLM, vector store, document content description, and metadata field information. This setup enables the assistant to query its knowledge base effectively.
update_memory
def update_memory(self, user_question, response):
self.memory.append({"question": user_question, "response": response})
Updates the assistant’s memory with the latest interaction.
Parameters:
user_question
: The question asked by the user.response
: The assistant’s response to the question.
Appends a dictionary containing the user’s question and the assistant’s response to the memory
deque. This keeps track of recent interactions, allowing the assistant to reference past queries and responses.
generate_embeddings_query
def generate_embeddings_query(self, input_data):
prompt = ChatPromptTemplate.from_template("""
User's Question: {{input}}
Previous conversation memory: {{memory}}
Generate only a query sentence and nothing else from the user's question to fetch from the data from embeddings. If the user's question does not have enough context then create a query based on the Knowledge Base.
""")
embedding_chain = prompt | self.llm
embeddings_query = embedding_chain.invoke(input_data)
if embeddings_query:
return embeddings_query
else:
return "Process failed"
Constructs a query for generating embeddings based on the user’s question and previous memory.
Parameters:
input_data
: Data used to generate a query for embeddings. Functionality:
Creates a prompt template incorporating the user’s question and previous memory. This prompt is then processed by the LLM to generate a query sentence for fetching data from embeddings. Returns the generated query or indicates process failure.
generate_embeddings
def generate_embeddings(self, input_data):
embeddings = self.retriever.invoke(input_data)
if embeddings:
return embeddings[:4]
return "No data available"
Generates embeddings for given input data.
Parameters:
input_data
: Data for which embeddings are to be generated. Functionality:
Invokes the retriever with the input data to obtain embeddings. Returns these embeddings if available; otherwise, indicates no data availability.
get_response
def get_response(self, prompt):
return self.llm.invoke(prompt)
Generates a response based on a given prompt.
Parameters:
prompt
: A structured input for generating a response. Functionality:
Invokes the LLM with the provided prompt to generate a natural language response.
handle_user_input
def handle_user_input(self, user_input):
sys_msg = """
You are an intelligent assistant that answers all questions about cricket using contextual information from Wikipedia. Your responses should be conversational and informative, providing clear and concise explanations. When relevant, include the source URL of the articles to give users additional reading material.
Always aim to:
1. Answer the question directly and clearly.
2. Provide context and background information when useful but do not give irrelevant information and answer to the point.
3. Suggest related topics or additional points of interest.
4. Be polite and engaging in your responses.
5. Remove the unnecessary context from the context provided if irrelevant to the question
Now, let's get started!
"""
Runnable = (
{"input": RunnablePassthrough(), "memory": RunnablePassthrough()}
| RunnableAssign({"embedding_query": RunnableLambda(self.generate_embeddings_query)})
| RunnableAssign({"context": RunnableLambda(self.generate_embeddings)})
| RunnableAssign({"prompt": lambda x: ChatPromptTemplate.from_template(
f"""
{sys_msg}
User's Question: {{input}}
Context Information: {{context}}
Previous Conversation memory: {{memory}}
Your Response:
"""
)})
| RunnableAssign({"response": lambda x: self.get_response(x["prompt"])})
| RunnableAssign({"memory": lambda x: self.update_memory(x["input"]["input"], x["response"])})
)
response = Runnable.invoke(
{"input": user_input, "memory": self.memory})
return response["response"]
Orchestrates the handling of user input to generate a relevant response.
Parameters:
user_input
: The query or statement provided by the user.
Functionality:
- Defines a system message outlining the assistant’s capabilities and guidelines.
- Constructs a series of operations (“runnables”) that:
- Pass through the input and memory,
- Generate an embeddings query,
- Retrieve context information based on embeddings,
- Construct a prompt incorporating system message, user’s question, context, and memory,
- Invoke the LLM to generate a response,
- Update the memory with the new interaction.
- Executes these operations sequentially to produce a response to the user input.
Key Components and Libraries
- CohereEmbeddings: Used for generating text embeddings.
- ChatNVIDIA & StrOutputParser: A large language model and its output parser for natural language processing tasks.
- SelfQueryRetriever: A retriever that queries its own knowledge base based on generated embeddings or queries.
- RunnableLambda, RunnableAssign, RunnablePassthrough: Utilities for defining a sequence of operations or “runnables” that process the user input and generate a response.
For further elaboration on the topic refer the first article in the series — Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 1
The CricketAssistant
class is designed to be part of a larger system that includes document processing and embedding generation, as indicated by the commented-out code at the end. This system involves preprocessing cricket-related documents from sources like Wikipedia, creating embeddings for these documents, and then using the CricketAssistant
to answer questions about cricket by leveraging these embeddings and the knowledge encoded in them.
Usage Example
To run the processor and assistant in the terminal like the simulation of a chatting interface use the following lines at the end of the file.
processor = WikipediaDocumentProcessor(
query="Cricket and everything related to cricket")
processor.run()
assistant = CricketAssistant(processor.vectorstore)
while True:
user_input = input("You: ")
if user_input.lower() == "exit":
print("Exiting chat...")
break
assistant.handle_user_input(user_input)
Use the below in the terminal and start up the project for testing purposes
python cricket_bot_data.py
cricket_bot_api.py
from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from contextlib import asynccontextmanager
from pydantic import BaseModel
import uvicorn
from cricket_bot_data import WikipediaDocumentProcessor, CricketAssistant
# Define your CORS policy
# Replace "*" with your specific origin(s) in production
CORS_POLICY = {
"allow_origins": ["*"],
"allow_credentials": True,
"allow_methods": ["*"],
"allow_headers": ["*"],
}
@asynccontextmanager
async def lifespan(app: FastAPI):
global vector_store
print("Running initialization tasks before server starts.")
processor = WikipediaDocumentProcessor(
query="Cricket and everything related to cricket")
processor.run()
vector_store = processor.vectorstore
print("Initialization complete.")
yield
print("Shutting Down.... Adios!!")
app = FastAPI(lifespan=lifespan)
# Add the CORSMiddleware to your FastAPI application
app.add_middleware(
CORSMiddleware,
**CORS_POLICY
)
class UserInput(BaseModel):
text: str
@app.get("/api/v1/health")
async def health():
return {"response": "Alive and well my friend !"}
@app.post("/chat")
async def chat_endpoint(user_input: UserInput):
assistant = CricketAssistant(vector_store)
response = assistant.handle_user_input(user_input.text)
return {"response": response}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
This script sets up a FastAPI server to provide an API for the cricket chatbot.Here is a breakdown of the code.
Import Required Libraries
import uvicorn
from pydantic import BaseModel
from fastapi import FastAPI, Request
from contextlib import asynccontextmanager
from fastapi.middleware.cors import CORSMiddleware
from cricket_bot_data import WikipediaDocumentProcessor, CricketAssistant
unicorn
: ASGI server to serve the FastAPI application.pedantic
: Provides data validation and settings management using Python type annotations.fastapi
: A modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.asynccontextmanager
: A decorator for defining asynchronous context.CORSMiddleware
: Middleware to handle Cross-Origin Resource Sharing (CORS), enabling cross-origin HTTP requests.cricket_bot_data
: Custom module assumed to contain definitions forWikipediaDocumentProcessor
andCricketAssistant
.
Define CORS Policy
CORS_POLICY = {
"allow_origins": ["*"],
"allow_credentials": True,
"allow_methods": ["*"],
"allow_headers": ["*"],
}
The CORS policy is defined to allow requests from any origin ("*"
), with credentials allowed, and permitting all methods and headers. This configuration is suitable for development but should be restricted in production environments to specific origins for security reasons.
Define Lifespan for Initialization Tasks
@asynccontextmanager
async def lifespan(app: FastAPI):
global vector_store
print("Running initialization tasks before server starts.")
processor = WikipediaDocumentProcessor(
query="Cricket and everything related to cricket")
processor.run()
vector_store = processor.vectorstore
print("Initialization complete.")
yield
print("Shutting Down.... Adios!!")
An asynchronous context manager named lifespan
is defined using @asynccontextmanager
. This manager performs initialization tasks before the server starts and cleanup after the server shuts down.
Initializing FastAPI:
We initialize the FastAPI app with the lifespan
context manager and add CORS middleware to the FastAPI app using the settings defined in CORS_POLICY
.
app = FastAPI(lifespan=lifespan)
# Add the CORSMiddleware to your FastAPI application
app.add_middleware(
CORSMiddleware,
**CORS_POLICY
)
During the initialization phase, the context manager prints a message indicating the start of initialization tasks. It then creates an instance of WikipediaDocumentProcessor
with a specific query related to cricket. The processor runs, presumably loading documents from Wikipedia, processing them, and generating embeddings. These embeddings are stored in a vector store (vector_store
), which is made globally accessible for use throughout the application. After completing these tasks, it prints a message indicating that initialization is complete.
The yield
statement is crucial as it pauses the execution of the context manager, allowing the FastAPI application to run. This means that after the initialization code has executed, control is handed over to the FastAPI app to serve requests. The code following the yield
statement won't execute until the application is shutting down.
Upon shutdown, the execution resumes after the yield
statement, allowing the context manager to perform any necessary cleanup actions. In this script, it simply prints a farewell message, indicating that the application is shutting down. However, in more complex applications, this phase could involve closing database connections, releasing resources, or saving state.
By passing the lifespan
function to the FastAPI
constructor via the lifespan
parameter, FastAPI knows to execute the initialization part of the lifespan
before starting the server and to execute the cleanup part after stopping the server.
Defining User Input Model:
The UserInput
class is a Pydantic model that validates the structure of incoming JSON payloads for the chat endpoint. It contains a single field text
which represents the user’s input text.
class UserInput(BaseModel):
text: str
Health Check Endpoint:
The /api/v1/health
endpoint is a simple GET endpoint that returns a JSON response indicating that the server is running. This can be useful for monitoring and ensuring the server is operational.
@app.get("/api/v1/health")
async def health():
return {"response": "Alive and well my friend !"}
Chat Endpoint:
- The
/chat
endpoint is a POST endpoint that accepts user input, processes it using theCricketAssistant
, and returns the generated response. - The
user_input
parameter is an instance of theUserInput
model, ensuring that the input data is correctly structured. - An instance of
CricketAssistant
is created using the globalvector_store
, and thehandle_user_input
method is called with the user input text to generate a response.
@app.post("/chat")
async def chat_endpoint(user_input: UserInput):
assistant = CricketAssistant(vector_store)
response = assistant.handle_user_input(user_input.text)
return {"response": response}
Starting the FastAPI Application:
The uvicorn.run
method starts the FastAPI application. The app will be available on http://0.0.0.0:8000
, ready to accept requests.
Running the Server
To run the server, execute the following command in your terminal:
uvicorn cricket_bot_api:app --reload
This will start the FastAPI server on port 8000, and you can interact with the chatbot via the /chat
endpoint.
Testing the Chatbot
You can test the chatbot using tools like curl
, Postman, or directly from a frontend application by sending a POST request to http://localhost:8000/chat
with the following JSON payload:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
--data '{"text": "Who won the cricket world cup in 2019?"}'
The server will respond with a JSON object containing the chatbot’s response.
Setting up a ReactJS Frontend
To set up the ReactJS frontend for the Cricket Bot project, follow these steps. These instructions assume you have Node.js and npm installed on your system. It is also assumed that the FastAPI server is up and running on http://localhost:8000. If not, please install them first.
Step 1: Clone the Repository
First, clone the repository to your local machine. Open a terminal and run:
git clone https://github.com/AbhiRam162105/Cricket_Bot.git
cd Cricket_Bot/frontend
Step 2: Install Dependencies
Before starting the project, you need to install the necessary dependencies listed in the package.json
file. Run the following command in the terminal:
nvm install 16.13.2
npm install
This command reads the package.json
file and installs all the required packages.
Step 3: Start the Development Server
Once the installation is complete, you can start the development server to see the application in action. Execute the following command:
npm start
This command compiles the React app and starts a development server. By default, the app will be served at http://localhost:3000
.
Additional Notes
- Building for Production: When you’re ready to build the project for production, use the
npm run build
command instead ofnpm start
. This command creates abuild
directory with a production build of your app. - Troubleshooting: If you encounter any issues, ensure that all dependencies are correctly installed and that there are no errors in the console output. Common issues include missing dependencies or conflicts between package versions.
These steps should help you set up and run the ReactJS frontend for the Cricket Bot project successfully.
And there you have it your very own CricketGPT ! You can use a database to store the chats and retrieve them when loading the application to make it a truly full stack application.
Conclusion
In this tutorial, we have built a cricket chatbot using LangChain and FastAPI. The chatbot processes cricket-related documents from Wikipedia, generates embeddings, and answers user queries using the NVIDIA AI model. The FastAPI server provides an API for interacting with the chatbot, allowing you to easily integrate it into various applications.
Github Link for the source code
Feel free to customize and expand this chatbot to include more features and handle a wider range of queries.
Until then —
This is Mr.E, signing off. Time to git commit and call it a day! Keep calm and code on!
References
- https://medium.com/tech-iiitg/simplest-introduction-to-rag-and-langchain-building-a-cricket-chatbot-part-i-0b98a658ee6f
- https://medium.com/thoughts-on-machine-learning/building-a-simple-rag-system-with-fastapi-1-382e15a6ae2a
- https://docs.hatchet.run/home/tutorials/fastapi-react/project-setup
- https://techcommunity.microsoft.com/t5/apps-on-azure-blog/create-a-retrieval-augmented-generation-rag-app-instantly-with/ba-p/4166678