Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 2

16 min readJul 20, 2024

The first article on the topic outlines the process of creating an intelligent Cricket Chatbot using Retrieval-Augmented Generation (RAG) and LangChain in a Jupyter Notebook environment. It starts by explaining RAG’s approach of combining retrieval and generative models to enhance response accuracy. LangChain is introduced as a framework that simplifies developing such models. The guide details steps including loading and processing cricket-related Wikipedia data, creating document chunks, and utilizing vector embeddings. It further covers setting up a self-querying retriever, integrating chat memory, and configuring the chatbot’s response generation. By following these steps, users can build a functional chatbot capable of answering cricket-related queries using Wikipedia data.

In this tutorial, we will build a cricket chatbot using various tools from the LangChain ecosystem and FastAPI. The chatbot will be capable of answering cricket-related questions by retrieving relevant information from Wikipedia and generating responses using the NVIDIA NIM Inference models.

Project Structure

The project consists of two main Python files:

cricket_bot_data.py: Handles data processing, embedding generation, and chatbot logic.
cricket_bot_api.py: Implements the FastAPI server to provide an API for the chatbot.

cricket_bot_data.py

This script accomplishes a series of technical tasks: it initiates by retrieving cricket-related documents from Wikipedia, then processes these documents by dividing them into smaller, manageable chunks. It subsequently generates embeddings for each of these chunks using Cohere’s embedding model and stores them in a Chroma vector store.

Here’s a step-by-step breakdown of the code:

Import Required Libraries

import os
from tqdm import tqdm
from rich.style import Style
from collections import deque
from functools import partial
from dotenv import load_dotenv
from rich.console import Console
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_cohere import CohereEmbeddings
from langchain_nvidia_ai_endpoints import ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import WikipediaLoader
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_core.runnables import RunnableLambda, RunnableAssign, RunnablePassthrough

Load Environment Variables

To use the ChatNVIDIA model, you need to configure your NVIDIA API key. Ensure you have set up a .env file for your API keys.

load_dotenv()
os.environ["NVIDIA_API_KEY"] = os.getenv('NVIDIA_API_KEY')
os.environ["COHERE_API_KEY"]=os.getenv('COHERE_API_KEY')

Initialize Console for Pretty Printing

First, import the necessary libraries and set up the console for pretty printing using rich.

console = Console()
base_style = Style(color="#76B900", bold=True)
pprint = partial(console.print, style=base_style)

def PPrint(preface="State: "):
    def print_and_return(x, preface=""):
        pprint(preface, x)
        return x
    return RunnableLambda(partial(print_and_return, preface=preface))

Wikipedia Document Processing

Define the WikipediaDocumentProcessor Class

class WikipediaDocumentProcessor:
    def __init__(self, query, load_max_docs=10, chunk_size=800, chunk_overlap=100):
        self.query = query
        self.load_max_docs = load_max_docs
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.docs = []
        self.all_chunks = []
        self.embeddings_model = CohereEmbeddings(
            cohere_api_key=os.getenv('COHERE_API_KEY'))
        self.vectorstore = None

    def load_documents(self):
        self.docs = list(tqdm(WikipediaLoader(
            query=self.query, load_max_docs=self.load_max_docs).load(), desc="Loading docs"))
        print(f"{len(self.docs)} documents loaded from Wikipedia.")

    def create_chunks_with_headers(self, doc, doc_index):
        chunks = []
        start = 0
        doc_content = doc.page_content
        doc_length = len(doc.page_content)

        while start < doc_length:
            end = min(start + self.chunk_size, doc_length)
            chunk = doc_content[start:end]

            if start != 0:
                chunk = doc_content[max(start - self.chunk_overlap, 0):end]

            chunk_json = {
                "meta_data": {
                    "title": doc.metadata["title"],
                    "summary": doc.metadata['summary'],
                    "source_url": doc.metadata['source'],
                },
                "chunk_index": len(chunks) + 1,
                "doc_index": doc_index,
                "content": chunk
            }
            chunks.append(chunk_json)

            start += self.chunk_size

        return chunks

    def process_and_create_embeddings(self):
        self.all_chunks = []
        for i, doc in enumerate(self.docs):
            chunks = self.create_chunks_with_headers(doc, i + 1)
            self.all_chunks.extend(chunks)

        documents = [Document(page_content=chunk["content"],
                              metadata=chunk["meta_data"]) for chunk in self.all_chunks]

        self.vectorstore = Chroma.from_documents(
            documents, self.embeddings_model)
        print("All data has been processed and stored in Chroma.")

    def run(self):
        self.load_documents()
        self.process_and_create_embeddings()
        print("All data has been processed.")

The WikipediaDocumentProcessor class automates document handling in an AI system by fetching relevant Wikipedia articles with the load_documents() method, chunking these documents with metadata using create_chunks_with_headers(), and converting them into embeddings with process_and_create_embeddings(). The embeddings are stored in a Chroma vector store for efficient retrieval. The run() method orchestrates this entire process, ensuring sequential execution and status updates, effectively preparing data for AI models or other applications.

Here is a more elaborate breakdown of the WikipediaDocumentProcessor class :

`init` Method

def __init__(self, query, load_max_docs=10, chunk_size=800, chunk_overlap=100):
        self.query = query
        self.load_max_docs = load_max_docs
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.docs = []
        self.all_chunks = []
        self.embeddings_model = CohereEmbeddings(
            cohere_api_key=os.getenv('COHERE_API_KEY'))
        self.vectorstore = None

Purpose: Initializes the instance of WikipediaDocumentProcessor.

Parameters:

query (str): The search query for Wikipedia articles.
load_max_docs (int, default=1000): Maximum number of documents to load. Start out with a smaller number of docs first, if the code is stable then increase the number because loading these many docs takes few tens of minutes.
chunk_size (int, default=800): The size of each document chunk.
chunk_overlap (int, default=100): The overlap between consecutive chunks.

Functionality:

Sets instance variables with the provided parameters.
Initializes empty lists for storing documents and chunks.
Creates an instance of CohereEmbeddings for generating document embeddings.
Initializes vectorstore as None.

`load_documents`

def load_documents(self):
        self.docs = list(tqdm(WikipediaLoader(
            query=self.query, load_max_docs=self.load_max_docs).load(), desc="Loading docs"))
        print(f"{len(self.docs)} documents loaded from Wikipedia.")

Purpose: Loads documents from Wikipedia based on the query.

Functionality:

Uses WikipediaLoader to load documents matching the query.
Limits the number of documents to load_max_docs.
Displays a progress bar using tqdm while loading.
Prints the number of documents loaded.

`create_chunks_with_headers`

    def create_chunks_with_headers(self, doc, doc_index):
        chunks = []
        start = 0
        doc_content = doc.page_content
        doc_length = len(doc.page_content)

        while start < doc_length:
            end = min(start + self.chunk_size, doc_length)
            chunk = doc_content[start:end]

            if start != 0:
                chunk = doc_content[max(start - self.chunk_overlap, 0):end]

            chunk_json = {
                "meta_data": {
                    "title": doc.metadata["title"],
                    "summary": doc.metadata['summary'],
                    "source_url": doc.metadata['source'],
                },
                "chunk_index": len(chunks) + 1,
                "doc_index": doc_index,
                "content": chunk
            }
            chunks.append(chunk_json)

            start += self.chunk_size

        return chunks

Purpose: Creates chunks of text from a document with metadata.

Parameters:

doc (Document): The document to chunk.
doc_index (int): The index of the document.

Functionality:

Splits the document’s content into chunks of size chunk_size with an overlap of chunk_overlap.
Adds metadata (title, summary, source URL) to each chunk.
Returns a list of chunks with their metadata.

`process_and_create_embeddings`

def process_and_create_embeddings(self):
        self.all_chunks = []
        for i, doc in enumerate(self.docs):
            chunks = self.create_chunks_with_headers(doc, i + 1)
            self.all_chunks.extend(chunks)

        documents = [Document(page_content=chunk["content"],
                              metadata=chunk["meta_data"]) for chunk in self.all_chunks]

        self.vectorstore = Chroma.from_documents(
            documents, self.embeddings_model)
        print("All data has been processed and stored in Chroma.")

Purpose: Processes the loaded documents into chunks and creates embeddings for these chunks.

Functionality:

Iterates through the loaded documents and creates chunks using create_chunks_with_headers.
Converts the chunks into Document objects with metadata.
Uses Chroma.from_documents to create a vector store from these Document objects using the CohereEmbeddingsmodel.
Prints a message indicating that processing and storage are complete.

run

def run(self):
        self.load_documents()
        self.process_and_create_embeddings()
        print("All data has been processed.")

Purpose: Orchestrates the entire document processing and embedding creation workflow.

Functionality:

Calls load_documents to load the documents from Wikipedia.
Calls process_and_create_embeddings to process the documents into chunks and store the embeddings.
Prints a message indicating that all data has been processed.

This breakdown outlines the purpose and functionality of each method within the WikipediaDocumentProcessor class.

Cricket Assistant Implementation

Now that the WikipediaDocumentProcessor class has processed and stored relevant information in our datastore, we need to implement a self-query pipeline. This pipeline will enable the LLM to retrieve and utilize the context from the stored document chunks based on the user's input, effectively leveraging the embeddings and metadata generated during document processing. For this we define another class named CricketAssistant for doing so.

Define the CricketAssistant Class

class CricketAssistant:
    def __init__(self, vectorstore):
        self.embeddings_model = CohereEmbeddings(
            cohere_api_key=os.getenv('COHERE_API_KEY'))
        self.vectorstore = vectorstore
        self.initialize_retriever()
        self.memory = deque(maxlen=5)
        self.llm = ChatNVIDIA(
            model="mistralai/mistral-7b-instruct-v0.2") | StrOutputParser()

    def initialize_retriever(self):
        metadata_field_info = [
            AttributeInfo(
                name="title", description="The name of the article", type="string"),
            AttributeInfo(
                name="summary", description="The short summary of the article contents", type="integer"),
            AttributeInfo(
                name="source_url", description="The web uri link to the article webpage", type="string"),
        ]
        document_content_description = "Data about cricket"
        llm = ChatNVIDIA(
            model="mistralai/mistral-7b-instruct-v0.2") | StrOutputParser()
        self.retriever = SelfQueryRetriever.from_llm(
            llm, self.vectorstore, document_content_description, metadata_field_info)

    def update_memory(self, user_question, response):
        self.memory.append({"question": user_question, "response": response})

    def generate_embeddings(self, input_data):
        embeddings = self.retriever.invoke(input_data)
        if embeddings:
            return embeddings
        else:
            return "No data available"

    def generate_embeddings_query(self, input_data):
        prompt = ChatPromptTemplate.from_template("""
User's Question: {{input}}
Previous conversation memory {{memory}}
Generate only a query sentence and nothing else from the user's question to fetch from the data from embeddings. If the user's question does not have enough context then create a query based on the Knowledge Base.
""")
        embedding_chain = prompt | self.llm
        embeddings_query = embedding_chain.invoke(input_data)
        if embeddings_query:
            return embeddings_query
        else:
            return "Process failed"

    def get_response(self, prompt):
        return self.llm.invoke(prompt)

    def handle_user_input(self, user_input):
        sys_msg = """
You are an intelligent assistant that answers all questions about cricket using contextual information from Wikipedia. Your responses should be conversational and informative, providing clear and concise explanations. When relevant, include the source URL of the articles to give users additional reading material.

Always aim to:
1. Answer the question directly and clearly.
2. Provide context and background information when useful but do not give irrelevant information and answer to the point.
3. Suggest related topics or additional points of interest.
4. Be polite and engaging in your responses.
5. Remove the unnecessary context from the context provided if irrelevant to the question

Now, let's get started!
"""

        Runnable = (
            {"input": RunnablePassthrough(), "memory": RunnablePassthrough()}
            | RunnableAssign({"embedding_query": RunnableLambda(self.generate_embeddings_query)})
            | RunnableAssign({"context": RunnableLambda(self.generate_embeddings)})
            | RunnableAssign({"prompt": lambda x: ChatPromptTemplate.from_template(
                f"""
{sys_msg}

User's Question: {{input}}

Context Information: {{context}}

Previous Conversation memory: {{memory}}

Your Response:
"""
            )})
            | RunnableAssign({"response": lambda x: self.get_response(x["prompt"])})
            | RunnableAssign({"memory": lambda x: self.update_memory(x["input"]["input"], x["response"])})
        )

        response = Runnable.invoke(
            {"input": user_input, "memory": self.memory})
        return response["response"]

`init (self, vectorstore)`

def __init__(self, vectorstore):
        self.embeddings_model = CohereEmbeddings(
            cohere_api_key=os.getenv('COHERE_API_KEY'))
        self.vectorstore = vectorstore
        self.initialize_retriever()
        self.memory = deque(maxlen=5)
           def __init__(self, vectorstore):
        self.embeddings_model = CohereEmbeddings(
            cohere_api_key=os.getenv('COHERE_API_KEY'))
        self.vectorstore = vectorstore
        self.initialize_retriever()
        self.memory = deque(maxlen=5)
        self.llm = ChatNVIDIA(
            model="mistralai/mixtral-8x22b-instruct-v0.1") | StrOutputParser()

Embeddings Model: Initializes a CohereEmbeddings instance using an API key retrieved from environment variables. This model is likely used for generating embeddings from text data.
Vector Store: Accepts a vectorstore object passed during initialization, which is presumably used to store or retrieve embeddings.
Retriever Initialization: Calls initialize_retriever() method to set up a retriever component.
Memory Management: Utilizes a deque with a maximum length of 5 to keep track of recent interactions, allowing the assistant to maintain context over a series of exchanges.
Large Language Model (LLM): Initializes a ChatNVIDIA instance with a specific model and chains it with a StrOutputParser, indicating that this LLM will be used for processing natural language inputs and outputs.

`initialize_retriever`

def initialize_retriever(self):
    metadata_field_info = [
        AttributeInfo(
            name="title", description="The name of the article", type="string"),
        AttributeInfo(
            name="summary", description="The short summary of the article contents", type="integer"),
        AttributeInfo(
            name="source_url", description="The web uri link to the article webpage", type="string"),
    ]
    document_content_description = "Data about cricket"
    llm = ChatNVIDIA(
        model="mistralai/mistral-7b-instruct-v0.2") | StrOutputParser()
    self.retriever = SelfQueryRetriever.from_llm(
        llm, self.vectorstore, document_content_description, metadata_field_info)

Configures a retriever component for querying relevant information based on embeddings.

Defines metadata field information for documents, including title, summary, and source URL.
Initializes a SelfQueryRetriever with the LLM, vector store, document content description, and metadata field information. This setup enables the assistant to query its knowledge base effectively.

`update_memory`

def update_memory(self, user_question, response):
        self.memory.append({"question": user_question, "response": response})

Updates the assistant’s memory with the latest interaction.

Parameters:

user_question: The question asked by the user.
response: The assistant’s response to the question.

Appends a dictionary containing the user’s question and the assistant’s response to the memory deque. This keeps track of recent interactions, allowing the assistant to reference past queries and responses.

`generate_embeddings_query`

def generate_embeddings_query(self, input_data):
      prompt = ChatPromptTemplate.from_template("""
      User's Question: {{input}}
      Previous conversation memory: {{memory}}
      Generate only a query sentence and nothing else from the user's question to fetch from the data from embeddings. If the user's question does not have enough context then create a query based on the Knowledge Base.
       """)
      embedding_chain = prompt | self.llm
      embeddings_query = embedding_chain.invoke(input_data)
      if embeddings_query:
          return embeddings_query
      else:
          return "Process failed"

Constructs a query for generating embeddings based on the user’s question and previous memory.

Parameters:

input_data: Data used to generate a query for embeddings. Functionality:

Creates a prompt template incorporating the user’s question and previous memory. This prompt is then processed by the LLM to generate a query sentence for fetching data from embeddings. Returns the generated query or indicates process failure.

`generate_embeddings`

 def generate_embeddings(self, input_data):
      embeddings = self.retriever.invoke(input_data)
      if embeddings:
          return embeddings[:4]
      return "No data available"

Generates embeddings for given input data.

Parameters:

input_data: Data for which embeddings are to be generated. Functionality:

Invokes the retriever with the input data to obtain embeddings. Returns these embeddings if available; otherwise, indicates no data availability.

`get_response`

def get_response(self, prompt):
        return self.llm.invoke(prompt)

Generates a response based on a given prompt.

Parameters:

prompt: A structured input for generating a response. Functionality:

Invokes the LLM with the provided prompt to generate a natural language response.

`handle_user_input`

  def handle_user_input(self, user_input):
        sys_msg = """
              You are an intelligent assistant that answers all questions about cricket using contextual information from Wikipedia. Your responses should be conversational and informative, providing clear and concise explanations. When relevant, include the source URL of the articles to give users additional reading material.
              
              Always aim to:
              1. Answer the question directly and clearly.
              2. Provide context and background information when useful but do not give irrelevant information and answer to the point.
              3. Suggest related topics or additional points of interest.
              4. Be polite and engaging in your responses.
              5. Remove the unnecessary context from the context provided if irrelevant to the question
              
              Now, let's get started!
        """
        
        Runnable = (
        {"input": RunnablePassthrough(), "memory": RunnablePassthrough()}
        | RunnableAssign({"embedding_query": RunnableLambda(self.generate_embeddings_query)})
        | RunnableAssign({"context": RunnableLambda(self.generate_embeddings)})
        | RunnableAssign({"prompt": lambda x: ChatPromptTemplate.from_template(
            f"""
        {sys_msg}

        User's Question: {{input}}
        
        Context Information: {{context}}
        
        Previous Conversation memory: {{memory}}
        
        Your Response:
        """
            )})
            | RunnableAssign({"response": lambda x: self.get_response(x["prompt"])})
            | RunnableAssign({"memory": lambda x: self.update_memory(x["input"]["input"], x["response"])})
        )

        response = Runnable.invoke(
            {"input": user_input, "memory": self.memory})
        return response["response"]

Orchestrates the handling of user input to generate a relevant response.

Parameters:

user_input: The query or statement provided by the user.

Functionality:

Defines a system message outlining the assistant’s capabilities and guidelines.
Constructs a series of operations (“runnables”) that:
Pass through the input and memory,
Generate an embeddings query,
Retrieve context information based on embeddings,
Construct a prompt incorporating system message, user’s question, context, and memory,
Invoke the LLM to generate a response,
Update the memory with the new interaction.
Executes these operations sequentially to produce a response to the user input.

Key Components and Libraries

CohereEmbeddings: Used for generating text embeddings.
ChatNVIDIA & StrOutputParser: A large language model and its output parser for natural language processing tasks.
SelfQueryRetriever: A retriever that queries its own knowledge base based on generated embeddings or queries.
RunnableLambda, RunnableAssign, RunnablePassthrough: Utilities for defining a sequence of operations or “runnables” that process the user input and generate a response.

For further elaboration on the topic refer the first article in the series — Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 1

The CricketAssistant class is designed to be part of a larger system that includes document processing and embedding generation, as indicated by the commented-out code at the end. This system involves preprocessing cricket-related documents from sources like Wikipedia, creating embeddings for these documents, and then using the CricketAssistant to answer questions about cricket by leveraging these embeddings and the knowledge encoded in them.

Usage Example

To run the processor and assistant in the terminal like the simulation of a chatting interface use the following lines at the end of the file.

processor = WikipediaDocumentProcessor(
     query="Cricket and everything related to cricket")
processor.run()

assistant = CricketAssistant(processor.vectorstore)

while True:
    user_input = input("You: ")
    if user_input.lower() == "exit":
        print("Exiting chat...")
        break
    assistant.handle_user_input(user_input)

Use the below in the terminal and start up the project for testing purposes

python cricket_bot_data.py

cricket_bot_api.py

from fastapi import FastAPI, Request
from fastapi.middleware.cors import CORSMiddleware
from contextlib import asynccontextmanager
from pydantic import BaseModel
import uvicorn
from cricket_bot_data import WikipediaDocumentProcessor, CricketAssistant

# Define your CORS policy
# Replace "*" with your specific origin(s) in production
CORS_POLICY = {
    "allow_origins": ["*"],
    "allow_credentials": True,
    "allow_methods": ["*"],
    "allow_headers": ["*"],
}


@asynccontextmanager
async def lifespan(app: FastAPI):
    global vector_store
    print("Running initialization tasks before server starts.")
    processor = WikipediaDocumentProcessor(
        query="Cricket and everything related to cricket")
    processor.run()
    vector_store = processor.vectorstore
    print("Initialization complete.")
    yield
    print("Shutting Down.... Adios!!")

app = FastAPI(lifespan=lifespan)

# Add the CORSMiddleware to your FastAPI application
app.add_middleware(
    CORSMiddleware,
    **CORS_POLICY
)


class UserInput(BaseModel):
    text: str


@app.get("/api/v1/health")
async def health():
    return {"response": "Alive and well my friend !"}


@app.post("/chat")
async def chat_endpoint(user_input: UserInput):
    assistant = CricketAssistant(vector_store)
    response = assistant.handle_user_input(user_input.text)
    return {"response": response}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

This script sets up a FastAPI server to provide an API for the cricket chatbot.Here is a breakdown of the code.

Import Required Libraries

import uvicorn
from pydantic import BaseModel
from fastapi import FastAPI, Request
from contextlib import asynccontextmanager
from fastapi.middleware.cors import CORSMiddleware
from cricket_bot_data import WikipediaDocumentProcessor, CricketAssistant

unicorn: ASGI server to serve the FastAPI application.
pedantic: Provides data validation and settings management using Python type annotations.
fastapi: A modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.
asynccontextmanager: A decorator for defining asynchronous context.
CORSMiddleware: Middleware to handle Cross-Origin Resource Sharing (CORS), enabling cross-origin HTTP requests.
cricket_bot_data: Custom module assumed to contain definitions for WikipediaDocumentProcessor and CricketAssistant.

Define CORS Policy

CORS_POLICY = {
    "allow_origins": ["*"],
    "allow_credentials": True,
    "allow_methods": ["*"],
    "allow_headers": ["*"],
}

The CORS policy is defined to allow requests from any origin ("*"), with credentials allowed, and permitting all methods and headers. This configuration is suitable for development but should be restricted in production environments to specific origins for security reasons.

Define Lifespan for Initialization Tasks

@asynccontextmanager
async def lifespan(app: FastAPI):
    global vector_store
    print("Running initialization tasks before server starts.")
    processor = WikipediaDocumentProcessor(
        query="Cricket and everything related to cricket")
    processor.run()
    vector_store = processor.vectorstore
    print("Initialization complete.")
    yield
    print("Shutting Down.... Adios!!")

An asynchronous context manager named lifespan is defined using @asynccontextmanager. This manager performs initialization tasks before the server starts and cleanup after the server shuts down.

Initializing FastAPI:

We initialize the FastAPI app with the lifespan context manager and add CORS middleware to the FastAPI app using the settings defined in CORS_POLICY.

app = FastAPI(lifespan=lifespan)

# Add the CORSMiddleware to your FastAPI application
app.add_middleware(
    CORSMiddleware,
    **CORS_POLICY
)

During the initialization phase, the context manager prints a message indicating the start of initialization tasks. It then creates an instance of WikipediaDocumentProcessor with a specific query related to cricket. The processor runs, presumably loading documents from Wikipedia, processing them, and generating embeddings. These embeddings are stored in a vector store (vector_store), which is made globally accessible for use throughout the application. After completing these tasks, it prints a message indicating that initialization is complete.

The yield statement is crucial as it pauses the execution of the context manager, allowing the FastAPI application to run. This means that after the initialization code has executed, control is handed over to the FastAPI app to serve requests. The code following the yield statement won't execute until the application is shutting down.

Upon shutdown, the execution resumes after the yield statement, allowing the context manager to perform any necessary cleanup actions. In this script, it simply prints a farewell message, indicating that the application is shutting down. However, in more complex applications, this phase could involve closing database connections, releasing resources, or saving state.

By passing the lifespan function to the FastAPI constructor via the lifespan parameter, FastAPI knows to execute the initialization part of the lifespan before starting the server and to execute the cleanup part after stopping the server.

Defining User Input Model:

The UserInput class is a Pydantic model that validates the structure of incoming JSON payloads for the chat endpoint. It contains a single field text which represents the user’s input text.

class UserInput(BaseModel):
    text: str

Health Check Endpoint:

The /api/v1/health endpoint is a simple GET endpoint that returns a JSON response indicating that the server is running. This can be useful for monitoring and ensuring the server is operational.

@app.get("/api/v1/health")
async def health():
    return {"response": "Alive and well my friend !"}

Chat Endpoint:

The /chat endpoint is a POST endpoint that accepts user input, processes it using the CricketAssistant, and returns the generated response.
The user_input parameter is an instance of the UserInput model, ensuring that the input data is correctly structured.
An instance of CricketAssistant is created using the global vector_store, and the handle_user_input method is called with the user input text to generate a response.

@app.post("/chat")
async def chat_endpoint(user_input: UserInput):
    assistant = CricketAssistant(vector_store)
    response = assistant.handle_user_input(user_input.text)
    return {"response": response}

Starting the FastAPI Application:

The uvicorn.run method starts the FastAPI application. The app will be available on http://0.0.0.0:8000, ready to accept requests.

Running the Server

To run the server, execute the following command in your terminal:

uvicorn cricket_bot_api:app --reload

This will start the FastAPI server on port 8000, and you can interact with the chatbot via the /chat endpoint.

Testing the Chatbot

You can test the chatbot using tools like curl, Postman, or directly from a frontend application by sending a POST request to http://localhost:8000/chat with the following JSON payload:

curl -X POST http://localhost:8000/chat \
     -H "Content-Type: application/json" \
     --data '{"text": "Who won the cricket world cup in 2019?"}'

The server will respond with a JSON object containing the chatbot’s response.

Setting up a ReactJS Frontend

To set up the ReactJS frontend for the Cricket Bot project, follow these steps. These instructions assume you have Node.js and npm installed on your system. It is also assumed that the FastAPI server is up and running on http://localhost:8000. If not, please install them first.

Image of frontend UI for cricket Chatbot — Frontend User Interface

Step 1: Clone the Repository

First, clone the repository to your local machine. Open a terminal and run:

git clone https://github.com/AbhiRam162105/Cricket_Bot.git
cd Cricket_Bot/frontend

Step 2: Install Dependencies

Before starting the project, you need to install the necessary dependencies listed in the package.json file. Run the following command in the terminal:

nvm install 16.13.2
npm install

This command reads the package.json file and installs all the required packages.

Step 3: Start the Development Server

Once the installation is complete, you can start the development server to see the application in action. Execute the following command:

npm start

This command compiles the React app and starts a development server. By default, the app will be served at http://localhost:3000.

Additional Notes

Building for Production: When you’re ready to build the project for production, use the npm run build command instead of npm start. This command creates a build directory with a production build of your app.
Troubleshooting: If you encounter any issues, ensure that all dependencies are correctly installed and that there are no errors in the console output. Common issues include missing dependencies or conflicts between package versions.

These steps should help you set up and run the ReactJS frontend for the Cricket Bot project successfully.

And there you have it your very own CricketGPT ! You can use a database to store the chats and retrieve them when loading the application to make it a truly full stack application.

Conclusion

In this tutorial, we have built a cricket chatbot using LangChain and FastAPI. The chatbot processes cricket-related documents from Wikipedia, generates embeddings, and answers user queries using the NVIDIA AI model. The FastAPI server provides an API for interacting with the chatbot, allowing you to easily integrate it into various applications.

Github Link for the source code

Feel free to customize and expand this chatbot to include more features and handle a wider range of queries.

Until then —

This is Mr.E, signing off. Time to git commit and call it a day! Keep calm and code on!

Simplest Introduction to RAG and LangChain: Building a Cricket Chatbot Part 2

Project Structure

cricket_bot_data.py

Import Required Libraries

Load Environment Variables

Initialize Console for Pretty Printing

Wikipedia Document Processing

Define the WikipediaDocumentProcessor Class

__init__ Method

load_documents

create_chunks_with_headers

process_and_create_embeddings

run

Cricket Assistant Implementation

Define the CricketAssistant Class

__init__ (self, vectorstore)

initialize_retriever

update_memory

generate_embeddings_query

generate_embeddings

get_response

handle_user_input

Key Components and Libraries

Usage Example

cricket_bot_api.py

Import Required Libraries

Define CORS Policy

Define Lifespan for Initialization Tasks

Initializing FastAPI:

Defining User Input Model:

Health Check Endpoint:

Chat Endpoint:

Starting the FastAPI Application:

Running the Server

Testing the Chatbot

Setting up a ReactJS Frontend

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Start the Development Server

Additional Notes

Conclusion

References

Written by Yanampally Abhiram Reddy

`init` Method

`load_documents`

`create_chunks_with_headers`

`process_and_create_embeddings`

`init (self, vectorstore)`

`initialize_retriever`

`update_memory`

`generate_embeddings_query`

`generate_embeddings`

`get_response`

`handle_user_input`