Building a CPU-Powered IT Help Desk Chatbot with fine-tuned LLAMA2–7B LLM and Chainlit

12 min readOct 21, 2023

Learn the step-by-step process to build a chatbot using LLAMA2–7B(quantized model), Chainlit, Langchain and FAISS vector DB

Introduction

Welcome to this article, where we’ll explore the fascinating world of chatbots. We’ll learn how to create a chatbot using a powerful language model, “LLAMA2–7B” designed to answer questions related to IT inquiries. What makes “LLAMA2–7B” truly exceptional is that it’s a quantized version, allowing it to run smoothly on a standard CPU with 16GB RAM. In our journey to create this smart chatbot, we’ve harnessed the power of the Chainlit library for the chat interface and the Faiss vector database by Meta. The LLAMA2–7B model has been fine-tuned on PDF files.

I’ve done my best to explain the development process in a step-by-step manner. I hope it’s helpful to you.

Table of Contents:

High-level Process Flow
Download LLM Model
Environment Setup
Script
How to execute on local

1. High-level Process Flow

2. Download LLM Model

In this step, we will download the Language Model from the Hugging Face. The Language Model we will be using is “llama-2–7b.ggmlv3.q8_0.bin,” and it can be found at the following link.

TheBloke/Llama-2-7B-GGML at main

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Special thanks to “TheBloke” (https://huggingface.co/TheBloke) for their role in converting the Llama2–7B model into GGML format, making it compatible with CPU usage.

3. Environment Setup

A. Folder Structure

Please use the folder structure you see in the image below. I have highlighted the important folders and files you need to create. The rest of the folders and files in the image will be made automatically when you run the code.

You can find the folder structure on my GitHub: https://github.com/GauravDesurakar/it-support-bot-llama2/tree/main

Make sure to download the HP troubleshooting file from the below link and paste it into the “data” folder.

http://h10032.www1.hp.com/ctg/Manual/c00757358.pdf

B. Create an Environment and Activate

conda create --name myenv python=3.9
conda activate myenv

C. Install Dependencies

pip install -r requirements.txt

Below is the screenshot of the requirements.txt file.

Up to this point, we’ve downloaded what we need and prepared the environment. Now, let’s move on to the exciting part: “Coding.”

4. Script

A. ingest.py

The purpose of this script is to create a text vector database from a PDF document(HP Troubleshooting).

# Import necessary classes from modules
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

"""
Defining Paths:
DATA_PATH: This constant specifies the path to the directory containing the PDF documents.
VECTOR_DB_PATH: This constant specifies the path where the FAISS vector database will be saved.
"""
# Define paths for data and vector database
DATA_PATH = "data/"
VECTOR_DB_PATH = "vectorstores/db_faiss"


# Define a function to build the vector database
def func_build_vector_db():
    # Initialize a directory loader to load PDF documents from the specified path
    loader = DirectoryLoader(DATA_PATH, glob="*.pdf", loader_cls=PyPDFLoader)
    # Load the documents using the loader
    documents = loader.load()
    # Initialize a text splitter to split documents into smaller chunks to prepare them for embedding.
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    # Split the documents into chunks of text
    texts = text_splitter.split_documents(documents)

    # Initialize HuggingFace embeddings using the 'sentence-transformers/all-MiniLM-L6-v2' model from Hugging Face's Transformers library.
    # The embeddings are computed on CPU (as specified by model_kwargs).
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2',
                                       model_kwargs={'device': 'cpu'})

    # Create vectors from the text chunks using the embeddings and store them in the vector database (FAISS)
    db = FAISS.from_documents(texts, embeddings)
    db.save_local(VECTOR_DB_PATH)


# Entry point of the script
if __name__ == '__main__':
    # Call the function to build the vector database
    func_build_vector_db()

Import classes from Langchain:

RecursiveCharacterTextSplitter: This class is likely a custom implementation that splits text documents into smaller chunks to prepare them for embedding.
PyPDFLoader: A class used to load PDF documents.
DirectoryLoader: A class used to load documents from a directory.
HuggingFaceEmbeddings: This class is used to generate text embeddings using models from Hugging Face’s Transformers library.
FAISS: A class used for managing and performing similarity searches on high-dimensional vectors.

Define Directory Path:

DATA_PATH: The directory containing the PDF documents you want to process.
VECTOR_DB_PATH: The path where the vector database, built using FAISS, will be saved.

Function — func_build_vector_db():

We start by initializing a directory loader, which is responsible for loading PDF documents from the specified directory.
The loader loads the PDF documents, making them ready for processing.
A text splitter is used to break the documents into smaller, manageable chunks. This step is essential for preparing the text for embedding.
The text chunks are then embedded using Hugging Face’s ‘sentence-transformers/all-MiniLM-L6-v2’ model. These embeddings are computed on the CPU for efficiency.
Finally, we create vectors from the text chunks using these embeddings and store them in the FAISS vector database, which is a highly efficient similarity search tool.
The vector database is saved locally at the specified VECTOR_DB_PATH.

B. model.py

This Python file serves the purpose of creating a question-answering (QA) bot with the ability to answer questions related to computer technology and troubleshooting.

Import classes from Langchain

from langchain import PromptTemplate
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import CTransformers
from langchain.chains import RetrievalQA
from loguru import logger

Importing necessary modules and classes, such as PromptTemplate, HuggingFaceEmbeddings, FAISS, CTransformers, RetrievalQA, and the logger from the Loguru library. These components will be used to build and operate the QA system.

VECTOR_DB_PATH

VECTOR_DB_PATH = "vectorstores/db_faiss"

The VECTOR_DB_PATH constant specifies the file path where the vector database (FAISS) is stored. This database will contain text embeddings for documents from which the QA system will retrieve information.

Custom prompt template

custom_prompt_template = """
You are professional Computer Technician. You have expertise in maintaining computer systems, troubleshooting errors, and repairing the organization's hardware.
Use the following pieces of information to answer the user's question. Always try to provide answers in bullet point.
If you don't know the answer then please tell that you are not sure about the answer. 
Use provided source to answer the question. Don't try to make up the answer.

Context: {context}
Question: {question}
"""

The custom_prompt_template is a structured text template that sets the context and instructions for the QA system. It defines the user as a professional computer technician with expertise in computer system maintenance, error troubleshooting, and hardware repair. It instructs the QA system to provide answers in bullet points and be transparent about not knowing an answer. The placeholders {context} and {question} are defined to accept context and question inputs.

Function — func_load_llm():

# Load the LLM model
def func_load_llm():
    """
    :return:
        Returns the created llm instance of the LangLink Model from the function.
    """
    logger.debug("Into def func_load_llm")
    llm = CTransformers(
        model="llm_model/llama-2-7b.ggmlv3.q8_0.bin", # Specifies the path to the pre-trained LangLink Model (LLM) binary file to be loaded
        model_type="llama", # indicating the type of the LangLink Model being loaded.
        max_new_tokens=512, # maximum number of new tokens to be generated. It controls length of response.
        temperature=0.6
    )
    return llm

The core of the function initializes an instance of the LangLink Model (LLM) using the CTransformers class. Several parameters are provided to configure this model:

model: Specifies the path to a pre-trained LangLink Model (LLM) binary file to be loaded. In this case, it’s “llm_model/llama-2–7b.ggmlv3.q8_0.bin.”
model_type: Indicates the type of the LangLink Model being loaded, which is set to “llama.”
max_new_tokens: Sets the maximum number of new tokens that can be generated by the model. This parameter controls the length of the generated responses.
temperature: Controls the level of randomness in the model’s output. A higher value makes the responses more random, while a lower value makes them more deterministic.

Function — func_set_custom_prompt():

# Define function to set custom prompt
def func_set_custom_prompt():
    """
    PromptTemplate: Prompts are input texts or instructions given to a language model to guide its text generation.
    The PromptTemplate class provides a way to define the structure of prompts using placeholders like {context} and {question}.

    :return:
        Returns the created PromptTemplate instance from the function.
    """
    logger.debug("Into def func_retrieval_qa_chain")
    prompt = PromptTemplate(template=custom_prompt_template, input_variables=['context', 'question'])
    logger.info(f"prompt: {prompt}")
    return prompt

This code block defines a Python function named func_set_custom_prompt(). The function's purpose is to create a custom prompt template for guiding a language model's text generation. Here's a breakdown of what this code accomplishes:

The core of the function creates a custom prompt template using the PromptTemplate class. This template is defined by the custom_prompt_template string, which contains specific instructions and placeholders. The placeholders, like {context} and {question}, are used to structure the prompts for the language model.

Function — func_retrieval_qa_chain():

# Define a function to RetrievalQA chain using the specified LLM, prompt, and vector store (db).
def func_retrieval_qa_chain(llm, prompt, db):
    """
    llm:  LangLink Model (LLM) instance, used for text generation.
    prompt: A structured prompt template to guide the generation process.
    db: Vector store (database) instance, used for retrieval.
    return:
        Returns the configured RetrievalQA chain.
    """
    logger.debug("Into def func_retrieval_qa_chain")
    qa_chain = RetrievalQA.from_chain_type(
                 llm=llm,
                 chain_type='stuff', # The chain type is set as 'stuff' for retrieval QA chain.
                 retriever=db.as_retriever(search_kwargs={'k': 1}), # no.of results
                 return_source_documents=True,# Ensures response from provided Document
                 chain_type_kwargs={'prompt': prompt}
                 )
    return qa_chain

This code block defines a Python function called func_retrieval_qa_chain(), which is responsible for configuring and returning a "RetrievalQA" chain. Here's a detailed explanation of what this code does:

llm=llm: It specifies the LangLink Model instance to be used for text generation.
chain_type=’stuff’: The chain type is set as ‘stuff,’ indicating that this is a retrieval QA chain.
retriever=db.as_retriever(search_kwargs={‘k’: 1}): This parameter configures the retrieval component of the QA system. It specifies the retrieval database (vector store) and search settings, with ‘k’: 1 indicating that the system should return one result.
return_source_documents=True: This setting ensures that the response from the source documents (retrieved information) is included in the output.
chain_type_kwargs={‘prompt’: prompt}: The structured prompt is passed as a keyword argument to guide the chain’s behavior.

Function — func_qa_bot():

# QA Model Function
def func_qa_bot():
    logger.debug("Into def func_qa_bot")
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", # generating embeddings
                                       model_kwargs={'device': 'cpu'}) # To model should be loaded on the CPU
    db = FAISS.load_local(VECTOR_DB_PATH, embeddings) # Loads a local FAISS vector store
    llm = func_load_llm() # Calling definition func_load_llm
    qa_prompt = func_set_custom_prompt()  # Calling definition func_set_custom_prompt
    qa = func_retrieval_qa_chain(llm, qa_prompt, db) # Calling definition func_retrieval_qa_chain
    logger.info(f"func_final_result: {qa}")
    return qa

This code block defines a function named func_qa_bot(),which serves as the central component for setting up and configuring a question-answering (QA) model, incorporating text embeddings, a vector database, a LangLink Model, and a custom prompt template. Here's a detailed explanation of what this code does:

embeddings: This variable is assigned the result of initializing the HuggingFaceEmbeddings class. It specifies a pre-trained model, “sentence-transformers/all-MiniLM-L6-v2,” for generating text embeddings. Additionally, it sets the model to load on the CPU for efficient processing.
db: This variable is assigned the result of loading a local FAISS vector store using the FAISS.load_local() method. It loads the vector store from the specified VECTOR_DB_PATH and associates it with the embeddings.
llm: The function func_load_llm() is called to create and return an instance of the LangLink Model (LLM).
qa_prompt: The function func_set_custom_prompt() is called to generate and return a structured prompt template for guiding text generation.
qa: The function func_retrieval_qa_chain() is called to configure and return a specialized RetrievalQA chain. This chain combines the LLM, prompt, and vector database, setting up the QA system for question answering.

Function — func_final_result():

# output function
def func_final_result(query):
    logger.debug("Into def func_final_result")
    qa_result = func_qa_bot()  # Calling definition func_qa_bot
    response = qa_result({'query': query})
    logger.info(f"func_final_result: {response}")
    return response

This code block defines a Python function named func_final_result(query), which serves as an output function for the question-answering (QA) system. Here’s a detailed explanation of what this code does:

It calls the function func_qa_bot() to initialize and configure the QA model. This function sets up all the necessary components, including the LangLink Model, text embeddings, a custom prompt, and the vector database.
The initialized QA model, represented by the variable qa_result, is then used to generate a response to the input query. The {‘query’: query} dictionary is passed to the qa_result to provide the query for which a response is sought.
The response generated by the QA model is stored in the variable response.

C. app.py

This file serves as a key component of an IT Support Chatbot. It uses the Chainlit framework to define and control the behavior of the chatbot during a conversation. The file performs the following main functions:

import chainlit as cl
from model import func_qa_bot
from loguru import logger

# Chainlit
# Defining a decorator for a function called start(), which will be triggered when a chat session starts
@cl.on_chat_start
async def start():
    logger.debug("Into decorator async def start")
    chain = func_qa_bot() # Calling function
    msg = cl.Message(content="Bot is initiating...") # Object Initiated
    await msg.send()
    msg.content = "Greetings for the day! Welcome to the IT Support Bot. How I can help?"
    await msg.update()
    cl.user_session.set("chain", chain)


# Defining a decorator for a function named main() that will be triggered when a message is received in the chat.
@cl.on_message
async def main(message):
    logger.debug("Into decorator async def main")
    chain = cl.user_session.get("chain")
    logger.info(f"Chain :: {chain} ")
    # Callback handler
    cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True,
                                          answer_prefix_tokens=["FINAL", "ANSWER"])
    cb.answer_reached = True
    res = await chain.acall(message, callbacks=[cb])
    answer = res["result"]
    # sources = res["source_documents"] # Storing source doc page no info

    answer += f"\n\n If problem continues then please reach out to IT Support Desk."

    # if sources:
    #     answer += f"\n\nSources:\n" + str(sources)
    # else:
    #     answer += "\nNo sources found"

    logger.info(f"async def main, res:: {answer} ")
    await cl.Message(content=answer).send()

Function — start():

This code block defines an asynchronous function named start() and uses a decorator @cl.on_chat_start to specify that this function should be triggered when a chat session starts in the context of a chatbot framework, possibly Chainlit.

Here’s a step-by-step explanation of what this code does:

chain = func_qa_bot(): This line calls the func_qa_bot() function, which is responsible for setting up and configuring a question-answering (QA) model for the chatbot. It initializes the QA model, including components like text embeddings and a vector database, and returns it.
msg = cl.Message(content=”Bot is initiating…”): This line creates a cl.Message object with the content “Bot is initiating…”. This message is intended to be sent to the user to provide some initial feedback.
await msg.send(): The await keyword is used to send the message to the user. It informs the user that the bot is initializing and ready to interact.
msg.content = “Greetings for the day! Welcome to the IT Support Bot. How can I help?”: This line updates the content of the message to a more welcoming and informative greeting. It lets the user know that they are interacting with an IT Support Bot and invites them to ask questions or seek assistance.
await msg.update(): The await keyword is used again to update the message with the new content. This step ensures that the user receives the updated greeting.
cl.user_session.set(“chain”, chain): This line sets a value in the user’s session data. In this case, it associates the key “chain” with the configured QA model instance (chain). This allows the chatbot to store and access information related to the QA model for the duration of the conversation.

Function — main():

This code block defines an asynchronous function named main(message) and uses a decorator @cl.on_message to specify that this function should be triggered when a message is received in the context of a chatbot framework, possibly Chainlit. Here's an explanation of what this code does:

chain = cl.user_session.get(“chain”): This line retrieves the previously stored QA model instance from the user’s session data using the key “chain.” This model is used to process and generate responses to the user’s message.
logger.info(f”Chain :: {chain} “): This line logs information about the retrieved QA model, providing visibility into the chatbot’s configuration.
cb = cl.AsyncLangchainCallbackHandler(stream_final_answer=True, answer_prefix_tokens=[“FINAL”, “ANSWER”]): A callback handler named cb is set up to handle the processing of the user’s message. It is configured to handle a potential “FINAL ANSWER” in the response.
cb.answer_reached = True: This line specifies that the callback handler should expect and process an “ANSWER” event.
res = await chain.acall(message, callbacks=[cb]): This line invokes the QA model (chain) to process the user’s message asynchronously. It provides the message as input and includes the callback handler cb to manage the response.
answer = res[“result”]: The response generated by the QA model is extracted from the result, and it is assigned to the variable answer.
answer += f”\n\n If the problem continues, please reach out to the IT Support Desk.”: This line adds an additional message to the answer, suggesting that if the user’s issue persists, they should contact the IT Support Desk.
logger.info(f”async def main, res:: {answer} “): The answer, now augmented, is logged for monitoring and debugging purposes.
await cl.Message(content=answer).send(): Finally, the answer is sent back to the user as a message. The content of the message is set to the answer, and it is sent as a response to the user’s message.

5. How to execute on local

Open the command prompt, and execute the bash command below. Make sure you have created a vector database by running ‘ingest.py’.

chainlit run app.py -w

The complete source code can be found at the following location:

GitHub - GauravDesurakar/it-support-bot-llama2

Contribute to GauravDesurakar/it-support-bot-llama2 development by creating an account on GitHub.

github.com

Other References:

Chainlit: https://github.com/Chainlit/chainlit
Langchain: https://github.com/langchain-ai/langchain
LLM: https://huggingface.co/TheBloke