ChatBot for Transactional Data with LLM(Beginner’s Approach) Part Two

Noman Anjum
7 min readJun 1, 2024

--

In the last blog, we learned about LLMs, Fine-tuning, and prompt Engineering. We then jumped into coding a chatbot for a financial transactions dataset. In this blog, we will continue the core development of the chatbot.

We did some preprocessing in the previous blog and defined a primary chat function with langchain to get a response from LLM.

Let’s start coding the next part of our chatbot for financial transaction data.

You can find the complete code of the project here

Prompt Engineering

In order to respond to customer queries we need two types of information from the query

  1. Intent of the user for a given query
  2. Related information for that particular intent.

Intent classification is a crucial task while building a chatbot. For some user cases people train a separate language model ie: BERT just for intent classification from the queries. Also, we need to know the parameters needed to call a function for a particular intent. Let’s suppose the user inputs a query “How much did I spend on shops during the first week of September?” So the intent of customer is to get the amount spent and, related information is, shops and the first week of September as time period.

So we’ll need our LLMs to perform two tasks to get results for the user. The first thing is to find intent and the second is to find related information to get information from the dataset. For now, we will try to tackle 4 types of intents from user queries.

  1. GetTransactions (when a user wants to list his transactions)
  2. GetTransactionsAmount (when a user wants to get the amount spent/deposited)
  3. Get Category/MerchantTransactions (list transactions for a particular category or a particular merchant)
  4. GetCategory/MerchantTransactionsAmount (get the amount spent/deposited for a particular merchant/ category)

We also need to know whether for above mentioned intents a user wants to get a list or amount for all of the transactions or it should be for a particular time period.

These 4 operations along with date intervals will help us resolve around 24 types of queries mentioned in the readme of the project.

We will engineer two prompts to collect this information. One prompt will try to extract intent and relative info and the second will try to extract if any date or time period is mentioned in the query.

Template to Find Intent and Related Information

intent_find_template = f"""
You are a Natural Language Understanding agent. Given a query from user you need to find out the intent of user. Some examples are given below.

1) GetTransactions
ie1: list transaction I have done this week.
response1 : "intent" : "GetTransactions"
ie2: list transaction I have done during february.
response2 : "intent" : "GetTransactions"

2) GetTransactionsAmount
ie1: How much did I spent today?
reponse1: "intent" : "GetTransactionAmount", "type" : "credit"
ie2: What were my total ex dpenses during last year?
response2: "intent" : "GetTransactionAmount", "type" : "credit"
ie3: How much amount I deposited during last month of 2023?
response3 : "intent" : "GetTransactionAmount" , "type" : "debit"

3) GetCategoryTransactions : categoryname
ie1: List my transactions I have done on groceries?
response1 : "intent" : "GetCategoryTransactions " , "categoty" : " groceries"
ie2: List my transactions for Starbucks?
response2 : "intent" : "GetCategoryTransactions", "category" : "Starbucks"

4) GetCategoryTransactionsAmount: categoryname
ie1: How much I have spent on shops so far?
response1: "intent" : "GetCategoryTransactionsAmount" , "type" : "credit", "category" : "shops"
ie2: How much I have spent on Starbucks so far?
response2 : "intent" : "GetCategoryTransactionsAmount " , "type" : "credit ", "category" : "Starbucks"
ie3: How much I have deposited via zelle this month.
reponse3: "intent" : "GetCategoryTransactionsAmount " , "type" : " debit ", "category" : " zelle"
ie 4: How much amount I have deposited using cash app?
response4: "intent" : "GetCategoryTransactionsAmount " , "type" : "debit , "category" : " cash app"

------------------------------------------------------------------------------------
Now reply with intent for following query. A cash or any kind of inflow is considered as debit while cash or other kind of outflow credit. Output result in json as mentioned in examples

Query:
"""

We are using few-shot prompting technique that presents some examples of input and possible output to llm. LLM with these examples tries to replicate results for a new query.

Template to extract correct Date Interval

date_find_prompt = f"""
You are a very precise date finding agent. Some examples of your work are as follow. You generate a json as output:
Today is {datetime.now()} and Day is {datetime.now().strftime("%A")}

Query 1: How much did I spend on Starbuck yesterday.
response: "date:2024-04-24"

Query: List transactions I have done during last week.
response: "date:2024-04-15, 2024-04-21"

Query: How much did I spent this month.
response: "date: 2024-04-01 , 2024-04-25"

Query: How much did I spend last year
response: "date:2023-01-01, 2023-12-30"

Note: 1) last year means from from 1 jan 2023 to 31 Dec of 2023, similarly last 2 years
means 2022-01-01, 2023-01-01. Last two months means 2024-02-01,2024-03-31

2) A week ends on Sunday, so during this week mean starting from Monday Till Sunday
which will be 2024-04-22, 2024-04-26.

3) If first Day of a month starts from Wednesday then first week means from Wednesday till
Sunday

4) last week or last month excludes current week or month. similarly last two weeks doesn't include current week.
-------------------------------------------------------------------------------
Now answer the following query: Only output json with keys start_date,end_date if there is interval, otherwise with date key. No text or comments. If there is no date for query just leave the field empty
Query: """

Here we use few-shot prompting along with some explicit information. You may change these instructions as it suits you.

Let’s test the response of LLM for some queries.

import json
result = []
queries = ["How much did I spend on Empowerment in during of August 2023?",
"What transactions I have made on RoarCurrency",
"List transactions I have made during July 2023",
"What were my expenses during 2023?",
"I need to see transactions I have done on 2023-09-01"
]

for query in queries:
intent,date = chat(intent_find_template,query),chat(date_find_prompt,query)
print(intent,date)
result.append([json.loads(intent),json.loads(date)])
{
"intent": "GetCategoryTransactionsAmount",
"type": "credit",
"category": "Empowerment"
} {
"start_date": "2023-08-01",
"end_date": "2023-08-31"
}
{
"intent": "GetCategoryTransactions",
"category": "RoarCurrency"
} {}
{
"intent": "GetTransactions",
"month": "July",
"year": "2023"
} {
"start_date": "2023-07-01",
"end_date": "2023-07-31"
}
{
"intent": "GetTransactionsAmount",
"type": "credit"
} {
"start_date": "2023-01-01",
"end_date": "2023-12-31"
}
{
"intent": "GetTransactions",
"date": "2023-09-01"
} {
"date": "2023-09-01"
}

Pretty Impressive. All of the intents are correct along with the dates :)

Finally, let’s define four helper functions to handle four intents.

def GetTransactions(self,dates):
listing = self._get_transactions(dates)
listing = self._clean_listing(listing)
dates = [date for date in dates.values()]

if len(dates)==1:
return f"Following are your Transactions for the date : {dates[0]}\n {listing}"
else:
return f"Following are your Transactions between : {dates[0]} and {dates[1]}.\n {listing}"



def GetTransactionsAmount(self,trn_type,dates):
dates = [date for date in dates.values()]
if trn_type == "credit":
if len(dates)==1:
listing = self.data[self.data["txn_date"] == pd.to_datetime(dates[0])]
return f"You spent {str(listing[listing['amt']<0]['amt'].sum()).lstrip('-')} on {dates[0]}\n "

else:
listing = self.data[(self.data["txn_date"] > pd.to_datetime(dates[0])) & (self.data["txn_date"] < pd.to_datetime(dates[1]))]
return f"You spent {str(listing[listing['amt']<0]['amt'].sum()).lstrip('-')} between {dates[0]} and {dates[1]}\n "
else:
if len(dates)==1:
listing = self.data[self.data["txn_date"] == pd.to_datetime(dates[0])]
return f"You spent {str(listing[listing['amt']>0]['amt'].sum()).lstrip('-')} on {dates[0]}\n "

else:
listing = self.data[(self.data["txn_date"] > pd.to_datetime(dates[0])) & (self.data["txn_date"] < pd.to_datetime(dates[1]))]
return f"You spent {str(listing[listing['amt']>0]['amt'].sum()).lstrip('-')} between {dates[0]} and {dates[1]}\n "


def GetCategoryTransactions(self,merchant,dates=None):
filtered_data = self.data
if self._is_valid_date(dates):
filtered_data = self._get_transactions(dates)
merchant = self.lemmatizer.lemmatize(str(merchant).lower())
listing = None
if merchant in self.merchants:
listing = filtered_data[filtered_data['lemm_merchant'] == merchant]
elif merchant in self.categories:
listing = filtered_data[filtered_data['lemm_cat'] == merchant]
if listing is not None and len(listing) > 0:
listing = self._clean_listing(listing)
return f"Following are your transactions for {merchant}.\n{listing}"
else:
return "No transaction found"

def GetCategoryTransactionsAmount(self,trn_type,merchant,dates=None):
filtered_data = self.data
if self._is_valid_date(dates):
filtered_data = self._get_transactions(dates)

amount = 0
listing = None
merchant = self.lemmatizer.lemmatize(str(merchant).lower())
if merchant in self.merchants:
listing = filtered_data[filtered_data['lemm_merchant'] == merchant]
elif merchant in self.categories:
listing = filtered_data[filtered_data['lemm_cat'] == merchant]
if listing is not None and len(listing) > 0:
if trn_type == "credit":
listing = listing[listing["amt"]<0]
else:
listing = listing[listing["amt"]>0]
amount = listing["amt"].sum()
return f"You have spent {amount} for {merchant}"

And the code that connects everything together.

for definition,date in chat(query):
func = definition["intent"]
method = get_function(data_fetcher,func)
category = definition.get("category",None)

type = definition.get("type",None)
if category:
if type:
answer = method(type,category,date)
else:
answer = method(category,date)
else:
if type:
answer = method(type,date)
else:
answer = method(date)
print(answer)

And here are our final results for the above-mentioned queries.

You have spent 7966.933999999999 for empowerment

-----------------------------------------------------
Following are your transactions for roarcurrency.
Unnamed: 0 txn_date amt cat merchant
23580 23580 2023-08-15 -13.598 Service RoarCurrency
23789 23789 2023-08-29 -14.598 Service RoarCurrency
24217 24217 2023-07-05 -13.598 Service RoarCurrency
... ... ... ... ... ...
235487 235487 2023-09-01 -0.920 Transfer RoarCurrency
235602 235602 2023-06-16 -1.998 Transfer RoarCurrency

-------------------------------------------------------
Following are your Transactions between : 2023-07-01 and 2023-07-31.
Unnamed: 0 txn_date amt cat merchant
641 641 2023-07-18 00:00:00 -20.000 Shops Apple Cash
642 642 2023-07-11 00:00:00 -15.000 Shops Apple Cash
... ... ... ... ... ...
257056 257056 2023-07-03 00:00:00 -40.500 ATM NaN
257061 257061 2023-07-18 00:00:00 -4.000 ATM NaN

Conclusion

In this article, we learned how can we use LLMs to extract intents, and supportive information from user queries to trigger a correct function and respond to the customer. There is a more advanced way to cater to this use case however I choose this naive approach so a beginner can understand the basics of developing a chatbot. The advanced way is to use SQL agents to directly find a relative SQL query for a given user query and execute it to get the results. Yes! no need to find intents and other information from the query. But that’s a story of another article. For now, I hope you understand what are some basic operations to develop a chatbot. I have also included a running code in the project folder of this repo. Just add your API key and enjoy. Happy Learning.

--

--