ChatBot for Transactional Data with LLM(Beginner’s Approach) Part 1

Noman Anjum
5 min readJun 1, 2024

--

What are Large Langauge Models?

A large language model (LLM) is a type of artificial intelligence (AI) designed to recognize and generate text, among other tasks. These models are termed “large” because they are trained on vast amounts of data(generally from all over the internet). LLMs are based on machine learning, specifically using a type of neural network known as a transformer model.

Simply put, an LLM is a computer program that has been provided with enough examples to understand and interpret human language or other complex data. Many LLMs are trained on massive datasets gathered from the Internet, comprising thousands or millions of gigabytes of text. However, the quality of these examples affects how well LLMs learn natural language, so their programmers may opt for more carefully selected datasets.

Fine Tuning Vs Prompt Engineering

LLMs can be used for a wide array of operations, including Classification, Summarizing, Q/A from documents, and Answering General Daily life questions. For different companies, LLMs are most useful when they can perform a specialized task related to the firm. This specialization leads us to two distinct paths, Finetuning an LLM and Prompt Engineering. We will discuss Prompt Engineering First and then we will talk about fine-tuning in brief.

Prompt Engineering:

Prompt Engineering refers to intelligently creating a template or instructions to feed to an LLM so it tries its best to solve a specialized problem. Say you want an LLM to act as an Agent who collects pizza orders from a customer by to and fro chatting. This task will require an intelligent template with precise instructions for an LLM to talk to the customer in a very friendly way and gather related information. In other words, prompt engineering is all about intelligently and efficiently instructing an LLM on how it can perform a specific task. An example of Prompt Engineering for a pizza bot is as follows:

You are a Pizza OrderBot, an automated service to collect orders for a pizza restaurant.
You first greet the customer, then collect the order,
and then ask if it's a pickup or delivery.
You wait to collect the entire order, then summarize it and check for a final
time if the customer wants to add anything else.
If it's a delivery, you ask for an address.
Finally, you collect the payment.
Make sure to clarify all options, extras, and sizes to uniquely
identify the item from the menu.
You respond in a short, very conversational, friendly style.
The menu includes
pepperoni pizza $12.95, $10.00, $7.00
cheese pizza $10.95, $9.25, $6.50
eggplant pizza $11.95, $9.75, $6.75
fries $4.50, $3.50
greek salad $7.25
Toppings:
extra cheese $2.00,
mushrooms $1.50
sausage $3.00
canadian bacon $3.50
AI sauce $1.50
peppers $1.00
Drinks:
coke $3.00, $2.00, $1.00
sprite $3.00, $2.00, $1.00
bottled water $5.00

Finetuning

Fine-tuning is a process in machine learning where a pre-trained model is further trained on a smaller, more specific dataset to adapt it to a particular task or domain. This method leverages the knowledge the model has already acquired during its initial training on a large, general dataset, and refines it to improve performance on a specific task.

So the papers and expert opinions suggests that, to make an LLM perform a special task the first approach used should be Prompt Engineering. But what if you’re not able to solve your problem with Prompt Engineering? Don’t worry we still have fine-tuning options for most of the Paid and Open Source LLMs. However, for this blog, we will stick to prompt engineering.

Chatbot For Transactional Data

In this blog, we will use prompt engineering to make a chatbot that answers transaction-related queries of a fintech customer. We will solve this problem in a very beginner’s way so we understand at the root level, what are the steps for developing a chatbot system for any domain.

The code for this project is here

Let’s dive into coding YAY!!!

Dataset

We have a fake dataset that contains transactions for various customers across various platforms, for different categories and merchants. The dataset consists of 257063 entries with the following columns.


clnt_id int64
bank_id int64
acc_id int64
txn_id int64
txn_date object
amt float64
cat object
merchant object
[257063 rows x 9 columns]

Possible Queries To Resolve:

If we look into the data, we find transaction date, amount, category, and merchant columns, these are critical columns on which a customer can ask different questions. For example, a user may ask how much did he spend on a particular merchant or a particular category. Or how much did he spend/deposit during a certain time period?

Data Preprocessing

At first, we need to convert txn_data to date time format to effectively perform date comparisons.

df['txn_date'] = pd.to_datetime(df["txn_date"])

The second thing is the lemmatization of category and merchant columns. Lemmatization allows us to reduce words to their base word or lemma which helps us reduce corpus size significantly. We will also convert these columns to lowercase for general structure.

lemmatizer = WordNetLemmatizer()
df["lemm_merchant"] = df["merchant"]and .apply(lambda x: lemmatizer.lemmatize(str(x).lower()))
df["lemm_cat"] = df["cat"].apply(lambda x: lemmatizer.lemmatize(str(x).lower()))

We have created new columns with lemmatized merchant and category because we will need original data when replying to a query.

LLM In Action

We will be using openAI’s gpt3.5. For this, we first need to create an API key from openAI’s account. We will use langchain to chain our templates with user queries and LLM for effectiveness and simplicity.

llm = ChatOpenAI(api_key=API_KEY)
output_parser = StrOutputParser()

def chat(template,message):
prompt = ChatPromptTemplate.from_messages( [
("system", template),
("user", "{input}")
])
chain = prompt | llm | output_parser

response = chain.invoke({"input": message})
return response

We first initialized OpenAI llm with an output parser to parse the response from llm output. Then we have declared a chat function that takes a template which is our Engineered Prompt, a message, which is the query of the user. We then specify our input structure for the prompt which is a template as system message(instructions on how to process query) and input as the user’s query. We then create a chain of prompts, the llm, and the output parser to perform all of the IO operations, and finally invoke the chain and return the response.

Conclusion

In this article we learned about LLMs, Prompt Engineering, and Fine tuning of LLMs. We also learned some basics of developing a chatbot.

I think that’s a lot to digest for one blog. We will continue chatbot development in the second part here

--

--