ChatBot for Transactional Data with LLM(Beginner’s Approach) Part Two

7 min readJun 1, 2024

In the last blog, we learned about LLMs, Fine-tuning, and prompt Engineering. We then jumped into coding a chatbot for a financial transactions dataset. In this blog, we will continue the core development of the chatbot.

We did some preprocessing in the previous blog and defined a primary chat function with langchain to get a response from LLM.

Let’s start coding the next part of our chatbot for financial transaction data.

You can find the complete code of the project here

Prompt Engineering

In order to respond to customer queries we need two types of information from the query

Intent of the user for a given query
Related information for that particular intent.

Intent classification is a crucial task while building a chatbot. For some user cases people train a separate language model ie: BERT just for intent classification from the queries. Also, we need to know the parameters needed to call a function for a particular intent. Let’s suppose the user inputs a query “How much did I spend on shops during the first week of September?” So the intent of customer is to get the amount spent and, related information is, shops and the first week of September as time period.

So we’ll need our LLMs to perform two tasks to get results for the user. The first thing is to find intent and the second is to find related information to get information from the dataset. For now, we will try to tackle 4 types of intents from user queries.

GetTransactions (when a user wants to list his transactions)
GetTransactionsAmount (when a user wants to get the amount spent/deposited)
Get Category/MerchantTransactions (list transactions for a particular category or a particular merchant)
GetCategory/MerchantTransactionsAmount (get the amount spent/deposited for a particular merchant/ category)

We also need to know whether for above mentioned intents a user wants to get a list or amount for all of the transactions or it should be for a particular time period.

These 4 operations along with date intervals will help us resolve around 24 types of queries mentioned in the readme of the project.

We will engineer two prompts to collect this information. One prompt will try to extract intent and relative info and the second will try to extract if any date or time period is mentioned in the query.

Template to Find Intent and Related Information

intent_find_template = f"""
You are a Natural Language Understanding  agent. Given a query from user you need to find out the intent of user. Some examples are given below.

1) GetTransactions
    ie1: list transaction I have done this week.
    response1 : "intent" : "GetTransactions"
    ie2: list transaction I have done during february.
    response2 : "intent" : "GetTransactions" 

2) GetTransactionsAmount 
    ie1: How much did I spent today?
    reponse1: "intent" : "GetTransactionAmount", "type"  :  "credit" 
    ie2: What were my total ex dpenses during last year?
    response2: "intent" : "GetTransactionAmount", "type"  :  "credit"
    ie3: How much amount I deposited during last month of 2023?
    response3 : "intent"  :  "GetTransactionAmount" , "type" : "debit"

3) GetCategoryTransactions : categoryname
    ie1: List my transactions I have done on groceries?
    response1 : "intent"  :  "GetCategoryTransactions " , "categoty" : " groceries"
    ie2: List my transactions for Starbucks?
    response2 : "intent"  :  "GetCategoryTransactions", "category" : "Starbucks"
        
4) GetCategoryTransactionsAmount: categoryname
    ie1: How much I have spent on shops so far?
    response1: "intent"  :  "GetCategoryTransactionsAmount" , "type" : "credit", "category" : "shops"
    ie2: How much I have spent on Starbucks so far?
    response2 : "intent"  :  "GetCategoryTransactionsAmount " , "type" : "credit ", "category" : "Starbucks"
    ie3: How much I have deposited via zelle this month.
    reponse3: "intent"  :  "GetCategoryTransactionsAmount " , "type" : " debit ", "category" : " zelle"
   ie 4: How much amount I have deposited using cash app?
    response4: "intent"  :  "GetCategoryTransactionsAmount  " , "type" : "debit , "category" : " cash app"

------------------------------------------------------------------------------------
Now reply with intent for following query. A cash or any kind of  inflow is considered as debit while cash or other kind of outflow credit. Output result in json as mentioned in examples

Query: 
"""

We are using few-shot prompting technique that presents some examples of input and possible output to llm. LLM with these examples tries to replicate results for a new query.

Template to extract correct Date Interval

date_find_prompt = f"""
You are a very precise date finding agent. Some examples of your work are as follow. You generate a json as output:
    Today is {datetime.now()} and Day is {datetime.now().strftime("%A")}

    Query 1: How much did I spend on Starbuck yesterday.
    response: "date:2024-04-24"

    Query: List transactions I have done during last week.
    response: "date:2024-04-15, 2024-04-21"

    Query: How much did I spent this month.
    response: "date: 2024-04-01 , 2024-04-25"

    Query: How much did I spend last year
    response: "date:2023-01-01, 2023-12-30"

    Note: 1) last year means from from 1 jan 2023 to 31 Dec of 2023, similarly last 2 years 
          means 2022-01-01, 2023-01-01. Last two months means 2024-02-01,2024-03-31
          
          2) A week ends on Sunday, so during this week mean starting from Monday Till Sunday 
            which  will be 2024-04-22, 2024-04-26.
            
          3) If first Day of a month starts from Wednesday then first week means from Wednesday till 
            Sunday
            
          4) last week or last month excludes current week or month. similarly last two weeks doesn't include current week.
    -------------------------------------------------------------------------------
    Now answer the following query: Only output json with keys start_date,end_date if there is interval, otherwise with date key. No text or comments. If there is no date for query just leave the field empty
    Query: """

Here we use few-shot prompting along with some explicit information. You may change these instructions as it suits you.

Let’s test the response of LLM for some queries.

import json
result = []
queries = ["How much did I spend on Empowerment in during of August 2023?",
          "What transactions I have made on RoarCurrency",
          "List transactions I have made during July 2023",
           "What were my expenses during 2023?",
            "I need to see transactions I have done on 2023-09-01"
          ]
          
for query in queries:  
    intent,date = chat(intent_find_template,query),chat(date_find_prompt,query)
    print(intent,date)
    result.append([json.loads(intent),json.loads(date)])

{
  "intent": "GetCategoryTransactionsAmount",
  "type": "credit",
  "category": "Empowerment"
} {
    "start_date": "2023-08-01",
    "end_date": "2023-08-31"
}
{
    "intent": "GetCategoryTransactions",
    "category": "RoarCurrency"
} {}
{
    "intent": "GetTransactions",
    "month": "July",
    "year": "2023"
} {
    "start_date": "2023-07-01",
    "end_date": "2023-07-31"
}
{
    "intent": "GetTransactionsAmount",
    "type": "credit"
} {
    "start_date": "2023-01-01",
    "end_date": "2023-12-31"
}
{
    "intent": "GetTransactions",
    "date": "2023-09-01"
} {
    "date": "2023-09-01"
}

Pretty Impressive. All of the intents are correct along with the dates :)

Finally, let’s define four helper functions to handle four intents.

def GetTransactions(self,dates):
        listing = self._get_transactions(dates)
        listing = self._clean_listing(listing)
        dates = [date for date in dates.values()]
        
        if len(dates)==1:
            return f"Following are your Transactions for the date : {dates[0]}\n {listing}"
        else:
            return f"Following are your Transactions between : {dates[0]} and {dates[1]}.\n {listing}"
    
   
    
    def GetTransactionsAmount(self,trn_type,dates):
        dates = [date for date in dates.values()]
        if trn_type == "credit":
            if len(dates)==1:
                listing = self.data[self.data["txn_date"] == pd.to_datetime(dates[0])]
                return f"You spent {str(listing[listing['amt']<0]['amt'].sum()).lstrip('-')} on {dates[0]}\n "

            else:
                listing = self.data[(self.data["txn_date"] > pd.to_datetime(dates[0])) & (self.data["txn_date"] < pd.to_datetime(dates[1]))]
                return f"You spent {str(listing[listing['amt']<0]['amt'].sum()).lstrip('-')} between {dates[0]} and {dates[1]}\n "
        else:
            if len(dates)==1:
                listing = self.data[self.data["txn_date"] == pd.to_datetime(dates[0])]
                return f"You spent {str(listing[listing['amt']>0]['amt'].sum()).lstrip('-')} on {dates[0]}\n "

            else:
                listing = self.data[(self.data["txn_date"] > pd.to_datetime(dates[0])) & (self.data["txn_date"] < pd.to_datetime(dates[1]))]
                return f"You spent {str(listing[listing['amt']>0]['amt'].sum()).lstrip('-')} between {dates[0]} and {dates[1]}\n "
                
    
    def GetCategoryTransactions(self,merchant,dates=None):
        filtered_data = self.data
        if self._is_valid_date(dates):
            filtered_data = self._get_transactions(dates)
        merchant = self.lemmatizer.lemmatize(str(merchant).lower())
        listing = None
        if merchant in self.merchants:
            listing = filtered_data[filtered_data['lemm_merchant'] == merchant]
        elif merchant in self.categories:
            listing = filtered_data[filtered_data['lemm_cat'] == merchant]
        if listing is not None and len(listing) > 0:
            listing = self._clean_listing(listing)
            return f"Following are your transactions for {merchant}.\n{listing}"
        else:
            return "No transaction found"
    
    def GetCategoryTransactionsAmount(self,trn_type,merchant,dates=None):
        filtered_data = self.data
        if self._is_valid_date(dates):
            filtered_data = self._get_transactions(dates)
        
        amount = 0
        listing = None
        merchant = self.lemmatizer.lemmatize(str(merchant).lower())
        if merchant in self.merchants:
            listing = filtered_data[filtered_data['lemm_merchant'] == merchant]
        elif merchant in self.categories:
            listing = filtered_data[filtered_data['lemm_cat'] == merchant]
        if listing is not None and len(listing) > 0:
            if trn_type == "credit":
                listing = listing[listing["amt"]<0]
            else:
                listing = listing[listing["amt"]>0]
            amount = listing["amt"].sum()
        return f"You have spent {amount} for {merchant}"

And the code that connects everything together.

for definition,date in chat(query):
    func = definition["intent"]
    method = get_function(data_fetcher,func)
    category = definition.get("category",None)
        
    type = definition.get("type",None)
    if category:
        if type:    
            answer = method(type,category,date)
        else:
            answer = method(category,date)
    else:
        if type:
            answer = method(type,date)
        else:
            answer = method(date)
    print(answer)

And here are our final results for the above-mentioned queries.

You have spent 7966.933999999999 for empowerment

-----------------------------------------------------
Following are your transactions for roarcurrency.
        Unnamed: 0   txn_date     amt       cat      merchant
23580        23580 2023-08-15 -13.598   Service  RoarCurrency
23789        23789 2023-08-29 -14.598   Service  RoarCurrency
24217        24217 2023-07-05 -13.598   Service  RoarCurrency
...            ...        ...     ...       ...           ...
235487      235487 2023-09-01  -0.920  Transfer  RoarCurrency
235602      235602 2023-06-16  -1.998  Transfer  RoarCurrency

-------------------------------------------------------
Following are your Transactions between : 2023-07-01 and 2023-07-31.
         Unnamed: 0            txn_date     amt    cat        merchant
641            641 2023-07-18 00:00:00 -20.000  Shops      Apple Cash
642            642 2023-07-11 00:00:00 -15.000  Shops      Apple Cash
...            ...                 ...     ...    ...             ...
257056      257056 2023-07-03 00:00:00 -40.500    ATM             NaN
257061      257061 2023-07-18 00:00:00  -4.000    ATM             NaN

Conclusion

In this article, we learned how can we use LLMs to extract intents, and supportive information from user queries to trigger a correct function and respond to the customer. There is a more advanced way to cater to this use case however I choose this naive approach so a beginner can understand the basics of developing a chatbot. The advanced way is to use SQL agents to directly find a relative SQL query for a given user query and execute it to get the results. Yes! no need to find intents and other information from the query. But that’s a story of another article. For now, I hope you understand what are some basic operations to develop a chatbot. I have also included a running code in the project folder of this repo. Just add your API key and enjoy. Happy Learning.

ChatBot for Transactional Data with LLM(Beginner’s Approach) Part Two

Prompt Engineering

Template to Find Intent and Related Information

Template to extract correct Date Interval

Conclusion

Written by Noman Anjum