From GenAI to Insights from Your Customers (Part 1)

Tara Pourhabibi
Google Cloud - Community

--

Analyzing customer complaints is crucial for businesses as it enhances customer experience and fosters trust by providing insights into areas that need improvement.

Summarization adds value by condensing vast amounts of feedback into actionable insights, enabling businesses to quickly identify trends, prioritize issues, and implement targeted solutions. This efficient process empowers businesses to proactively address customer concerns, improve products or services, and ultimately, improve customer satisfaction and gain more loyalty.

In this post, my main goal is to condense lengthy customer complaints (Consumer Finance Protection Bureau (CFPB) data) and extract relevant important information from them efficiently. I guide you through my utilization of Vertex AI PaLM2 along with LangChain and compare the results of the summarized complaint with an open source LLM (LaMini-Flan-T5–248M) alongside LangChain.

#install required libraries

!pip install huggingface-hub
!pip install langchain
!pip install transformers
#load require librraies
import pandas as pd

#import hface pipeline from langchain and summarize chain
from langchain.llms import HuggingFacePipeline
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain import PromptTemplate, LLMChain

# load Vertex AI
from langchain.llms import VertexAI

Define LLM Model

LLM Model: Vertext AI PaLM 2

PaLM 2 is Google’s LLM approach to responsible Generative AI and is fine-tuned for different NLP tasks such as classification, summarization, and entity extraction.

#Define Vertex AI PaLM 2 llm to generate response
llm = VertexAI(model_name='text-bison@001',
batch_size=100, #set this if you are using batch processing
model_kwargs={"temperature":0, "max_length":512}
)

LLM Model: LaMini-Flan-T5

LaMini-Flan-T5–248 is an open source LLM; a refined iteration of google/flan-t5-base trained on the LaMini-instruction dataset with 2.58M samples.

#defining the lamini model chackpoint in langchain
checkpoint = 'MBZUAI/LaMini-Flan-T5-248M'

#huggingfacepipeline details
# Define llm to generate response
llm = HuggingFacePipeline.from_model_id(model_id=checkpoint,
batch_size=100 #set this if you are using batch processing
task ='text2text-generation'
model_kwargs={"temperature":0, "max_length":512})

Define Text Splitter and Summarizer Chain

Since some of the complaints have long description (that exceed the maximum allowed token size in LLM models), I use LangChain to split them into separate chunks using a “map_reduce” chain type. This will send each chunk separately to LLM in “Map” process, and then “Reduce” function will integrate all summaries together at the end. This is one way to summarize large documents but requires several calls to the LLM. It however, may impact the accuracy and performance.

See bellow how I defined a recursive text splitter and a prompt with PromptTemplate to guide the LLM to summarize the text.

#define a recursive text spitter to chucnk the complaints
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 1000, #I set a to chunck size of 1000
chunk_overlap = 40,
length_function = len,
)


#set prompt template
prompt_template ="""
summarize the given text by high lighting most important information

{text}

Summary:
"""

#define prompt template
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

#define chain with a map_reduce type
chain = load_summarize_chain(llm, map_prompt=prompt, combine_prompt=prompt, verbose=True,chain_type="map_reduce")

Summarization can be done in an “online” (for one complaint at a time) or “batch” for a batch/chunk of complaints.

Online Summarization:

#for an online mode, just pass one complaint text

texts = text_splitter.create_documents([complaint_text])
summary = llm_chain.run(texts)
print(summary)

Here you can see an example of a splitted complaint description:

#example of output splitted texts
[
Document(
page_content='I am writing to formally complain about inaccurate and illegal reporting of transactions on
my credit report, which I believe violates the Fair Credit Reporting Act ( FCRA ) specifically 15 U.S. Code 1681a.I
have carefully reviewed my credit report, and I have identified several inaccuracies in the reporting of late
payments and utilization of credit. As per 15 U.S. Code 1681a, The term consumer reporting agency means any person
which, for monetary fees, dues, or on a cooperative nonprofit basis, regularly engages in whole or in part in the
practice of assembling or evaluating consumer credit information or other information on consumers for the purpose
of furnishing consumer reports to third parties, and which uses any means or facility of interstate commerce for
the purpose of preparing or furnishing consumer reports. \\n The term consumer means an individual. \\n The term
consumer report means any written, oral, or other communication of \\nany information by a consumer reporting'
),
Document(
page_content="information by a consumer reporting agency bearing on a consumers credit worthiness, credit
standing, credit capacity, character, general reputation, personal characteristics, or mode of living. * ( 2 )
Exclusions \\n ( A ) ( i ) report containing information solely as to transactions or experiences between the
consumer and the person making the report ; \\n It is illegal to report inaccurate information that adversely
affects a consumer 's creditworthiness. Below are the specific discrepancies that I have identified Account number
Account type : Home Equity. Date opened Late payments recognized on of . Account number Account type : Credit card.
Date opened Late payments recognized on of Account number Account type : Auto Loan. Date opened Late payments
recognized on of Account number Account type : Home Equity. Date opened 104 % credit utilization . Account number
Account type : Credit card. Date opened 1 % utilization Account number Account type : Credit Card. Date opened 1 %"
),
Document(
page_content='type : Credit Card. Date opened 1 % Utilization.\\n I am formally requesting that you conduct
a thorough investigation into these matters, as required by the FCRA. I kindly request that you promptly correct
the inaccurate information on my credit report by removing the incorrect late payments and adjusting the reported
credit utilization. I understand that under 15 U.S. Code 1681i, you are required to conduct a reasonable
investigation within 30 days of receiving a dispute. I urge you to adhere to this statutory requirement and provide
me with written notification of the results of your investigation. If the investigation confirms the inaccuracies,
I request that you update my credit report accordingly and provide me with a revised copy. Additionally, I would
appreciate it if you could provide me with information on the steps taken to prevent such errors in the future. If
my concerns are not addressed within the stipulated time frame, I will have no choice but to escalate this matter'
),
Document(
page_content='no choice but to escalate this matter to the Consumer Financial Protection Bureau. Please
treat this matter with the urgency it deserves.'
)
]

and here a pretty good overall summary of important information in the complaint has been highlighted:

#Example of output summary using LaMini-Flan-T5

The person is expressing a complaint about inaccurate and illegal reporting of transactions on their credit report,
which violates the Fair Credit Reporting Act (FCRA) specifically 15 U.S. Code 1681a. The person identified several
discrepancies in the reporting of late payments and utilization of credit, and is requesting a thorough
investigation into the credit card utilization and the FCRA's requirement to conduct a reasonable investigation
within 30 days of receiving a dispute. They request to update their credit report and provide information on steps
to prevent future errors.
#Example of output summary using Vertext AI PaLM 2 LLM

The writer is writing to formally complain about inaccurate and illegal
reporting of transactions on their credit report. The writer believes that
this violates the Fair Credit Reporting Act (FCRA). The writer has carefully reviewed their credit report and has identified several inaccuracies in the reporting of late payments and utilization of credit. The writer is requesting that the consumer reporting agency correct the inaccuracies in their credit report.\n\nIt is illegal to report inaccurate information that adversely affects a consumer's creditworthiness.
The specific discrepancies are:\n\n- Account number Account type: Home Equity. Date opened Late payments recognized on of .\n- Account number Account type: Credit card

Batch Summarization:

For batch processing, “batch” execution mode of LangChain should be called:

def split_doc(text_splitter,doc):
"""
function to split an input document using Langchain
Args:
text_splitter: a langchain text splitter
doc: string text
Output:
texts: a dictionary of splitted text
"""
texts = text_splitter.create_documents([doc])

return texts

def summarize_docs(docs,llm_chain):
"""
function to summarize chunked documents
Args:
llm_chain: a langchain summarize chain
docs: chunked documents
Output:
summaries: list of summarized documents
"""
#summarize all chunks in one go
summary = llm_chain.batch(docs)

summaries=[]
#extract summaries
for summarized_doc in summary:
summaries.append(summarized_doc['output_text'])

return summaries

#load complaints data
#read data from file storage
df_complaints=read_data()

#set the complaint description column for summarizing
desc_col='Consumer complaint narrative'

#chunk all complaints
docs=df_complaints[desc_col].apply(lambda doc: split_doc(text_splitter,doc) )

#extract and concatenate summaries to the original data
df_complaints['summarized_narrative']=summarize_docs(docs,llm_chain)

Evaluation:

Human assessment plays a crucial role in evaluating summarization tasks. It is essential to thoroughly review the generated output to ensure it is concise and also maintains the core objectives of the original text.

To ensure that summaries are aligned closely with human perception, selected samples of summarized documents can be compared with human interpretations, and their ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score can be calculated. ROUGE comprises metrics that help effective evaluation of automatic text summarization and machine translations.

Final Note:

In this article, I’ve showcased the development of a scalable summarization solution. Both LaMini-Flan-T5 and Vertex AI PaLM 2 API (taking into account the associated costs) models along with LangChain exhibited strong performance in extracting important highlights from complaints, showcasing robust capabilities in generating AI-powered summaries.

In upcoming posts, I’ll employ BERTopic and LLM to identify the predominant trends in customer complaints and uncover the root causes behind these issues. This analysis aims to provide valuable insights for businesses.

--

--