Banking Query Intent Detector

Winston Fernandes
Geek Culture
Published in
7 min readMar 8, 2021

This project is an end-to-end case study of how advance deep learning techniques can help us to classify the intent of a query asked by a bank customer without any human intervention. This scenario can be helpful for a chat-bot to decide what should be the reply message to the customer after understanding the intention of the message.

https://blog.talla.com/how-to-implement-a-support-chatbot-the-right-way

Table of Contents:

  1. Business Problem
  2. The need for data science
  3. Source of Data
  4. Exploratory Data Analysis
  5. First cut Solution using GloVe
  6. Deep Learning Models using State of the art techniques
  7. Model Comparison
  8. Deployment using Streamlit
  9. Conclusion
  10. Future Work
  11. Profile
  12. References

1. Business Problem

The business problem we are looking into is given a text query of a bank customer in text format we have to classify the text in one of the intention classes. Also, we need need to get a model that gives high performance in this task and light weighted in size.

2. The need for data science

Why we need data science for this problem?

  • The existing approach to do this is to use a human to get the intention which is what the call center staff does.
  • But for simple queries where we just have to give a selected answer, machines could be useful which would rather be annoying every time for the call center staff they can answer high-level queries.
  • Also, the machine can be available 24 x 7.

Which metric to use to validate our model performance?

  • The Banking data has 77 categories of intentions
  • The data is imbalanced because the number of samples for these 77 classes is different in the train data.
  • Accuracy cannot be used for imbalanced data, hence not helpful in our case.
  • Precision will tell of all the points that are predicted to be positive how many of them are actually which is important in our case.
  • The recall is important since it will tell of all the points that are actually positive how many of them are predicted positive.
  • F1-score is the geometric mean of precision and recall and precision and is high if recall and precision are high.
  • Recall and Precision both should be high in our scenario we can consider F1-score as our metric.
  • Since our data is imbalanced we are going to consider the Weighted F1 score it will consider the class-wise weight when calculating the F1 score.

3. Source of Data

The data scripts were provided by the company Poly-AI, it contains columns text and category the text belongs to.

banking data

4.Exploratory Data Analysis

count of words present in each text

This plot will help us identify what should be the maximum length of the input sentence given to the model, mostly consider the max length in which 90–95% of sentences fall.

Word Clouds

i. Category: card_payment_fee_charged

Conclusion

  • The card_payment_fee_charged intention is recognized commonly by the words “fee”, “charged”, “extra”.
  • This happens mostly when a credit or debit card is used.

ii. Category: lost_or_stolen_card

Conclusion

  • The lost_or_stolen_card intention is recognized commonly by the words “card”, “lost”, “stolen”, “help”.

iii. Category: wrong_amount_of_cash_received

Conclusion

  • The wrong_amount_of_cash_received intention commonly happens when atm machine does not function properly.
  • The common words are “atm”, “cash”, “money”.

5. First cut solution

As a first cut solution, we tried using the GloVe algorithm

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Creating data pipeline:

Model Architecture:

Some combinations tried,

The least over-fitting was seen in Model 4 Dropout 0.5 with early stopping patience 5 Train F1 score 0.8793 and Validation F1 score 0.8755

6. Deep Learning Models using State of the Art Techniques

  1. BERT (Bidirectional Encoder Representations from Transformers):

Refer to this site for more information on BERT,

We have used pre-trained Bert model (bert_en_wwm_cased_L-24_H-1024_A-16) to get the embeddings:

Model Architecture:

The final results with the BERT model, train f1-score 0.8452, and validation f1-score 0.7802. It seems the BERT model was over-fitting the data also the size of the model was very large. BERT model was not able to perform better than GloVe.

2. Universal Sentence Encoder:

The Universal Sentence Encoder is a dual encoder model that encodes text into high-dimensional vectors that can be used for text classification, semantic similarity, clustering, and other natural language tasks.

Model Architecture:

The final results with the USE model, train f1-score 0.9350, and validation f1-score 0.9314. These were the best results obtained compared to GloVe and BERT model. It had the least overfitting also the size of the model was less compared to GloVe and BERT.

3. ConveRT (Conversational Representations from Transformers):

ConveRT is a dual sentence encoder , it is effective, affordable, and quick to train also the size of the ConveRT model is less compared to the BERT model. ConveRT as per the company PolyAI who developed it was trained on Reddit conversational data (context, response). But unfortunately, ConveRT has no implementation in Tensorflow. So I tried building ConveRT from scratch in TensorFlow and training it with Reddit conversational data.

Model Architecture;

i. Positional encoding

Please refer to this blog for a detailed understanding of positional encoding,

Positional Encoding

ii. Encoder layer

iii. Complete Encoder with the encoder layers

iv. Final ConveRT model

Loss implementation used:

i. Loss based on ConveRT paper

ConveRT Loss

S(xi,yi)= Similarity between context its corresponding response

S(xi,yj)= Similarity between context and other responses

ConveRTloss

ii. Triplet loss

Triplet loss is a loss function for machine learning algorithms where a baseline (anchor) input is compared to a positive (truthy) input and a negative (falsy) input.

triplet loss= maximum(positive distance+negative distance+margin,0)

Triplet loss

When training with the Reddit data the issue was the loss wasn’t updating. We tried options like switching the loss, changing the batch size, changing the max length of the input, checking the histograms if they are updating but we couldn’t find a way out also we had time constraints so we paused our research in the ConveRT section. But I wish to resume it soon.

7. Model Comparison

8. Deployment using Streamlit

Final Deployment of Model created using Stream-lit library and Deployed on Stream-lit.

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.

Streamlit logo

Link to the web app:

https://share.streamlit.io/wins999/query_intent_detector/main/main_app.py

9. Conclusion

We have tried different language-based models to solve our problem regarding bank query intent detection. We found that dual sentence encoder models like Universal Sentence Encoder are better in performance in such scenario compared to the state of the art model BERT. Also Universal Sentence Encoder model is lighter in weight than the BERT model.

10. Future work

We have tried to solve our problem with models like GloVe, BERT, and USE. We have also built the ConvRT model architecture reading its research paper, while training there is an issue that the loss is not updating due to time constraints I have paused our research in the ConveRT section. But I wish to resume it soon.

For the code just fork my project from the github link mentioned below for any queries or improvements that you want to suggest me comment or connect me on LinkedIn or mail .

Mail: winston23fernandes.wf@gmail.com

12 References

--

--

Winston Fernandes
Geek Culture

Aspiring Data Scientist, Consultant at L&T Infotech