From GenAI to Insights from Your Customers (Part 1)

Published in

Google Cloud - Community

7 min readApr 2, 2024

Analyzing customer complaints is crucial for businesses as it enhances customer experience and fosters trust by providing insights into areas that need improvement.

Summarization adds value by condensing vast amounts of feedback into actionable insights, enabling businesses to quickly identify trends, prioritize issues, and implement targeted solutions. This efficient process empowers businesses to proactively address customer concerns, improve products or services, and ultimately, improve customer satisfaction and gain more loyalty.

In this post, my main goal is to condense lengthy customer complaints (Consumer Finance Protection Bureau (CFPB) data) and extract relevant important information from them efficiently. I guide you through my utilization of Vertex AI PaLM2 along with LangChain and compare the results of the summarized complaint with an open source LLM (LaMini-Flan-T5–248M) alongside LangChain.

#install required libraries

!pip install huggingface-hub
!pip install langchain
!pip install transformers

#load require librraies
import pandas as pd

#import hface pipeline from langchain and summarize chain
from langchain.llms import HuggingFacePipeline
from langchain.chains.summarize import load_summarize_chain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain import PromptTemplate, LLMChain

# load Vertex AI
from langchain.llms import VertexAI

Define LLM Model

LLM Model: Vertext AI PaLM 2

PaLM 2 is Google’s LLM approach to responsible Generative AI and is fine-tuned for different NLP tasks such as classification, summarization, and entity extraction.

#Define Vertex AI PaLM 2 llm to generate response
llm = VertexAI(model_name='text-bison@001',
                 batch_size=100, #set this if you are using batch processing
                 model_kwargs={"temperature":0, "max_length":512}
                  )

LLM Model: LaMini-Flan-T5

LaMini-Flan-T5–248 is an open source LLM; a refined iteration of google/flan-t5-base trained on the LaMini-instruction dataset with 2.58M samples.

#defining the lamini model chackpoint in langchain
checkpoint = 'MBZUAI/LaMini-Flan-T5-248M'

#huggingfacepipeline details
# Define llm to generate response
llm = HuggingFacePipeline.from_model_id(model_id=checkpoint,
                                        batch_size=100 #set this if you are using batch processing
                                        task ='text2text-generation'
                                        model_kwargs={"temperature":0, "max_length":512})

Define Text Splitter and Summarizer Chain

Since some of the complaints have long description (that exceed the maximum allowed token size in LLM models), I use LangChain to split them into separate chunks using a “map_reduce” chain type. This will send each chunk separately to LLM in “Map” process, and then “Reduce” function will integrate all summaries together at the end. This is one way to summarize large documents but requires several calls to the LLM. It however, may impact the accuracy and performance.

See bellow how I defined a recursive text splitter and a prompt with PromptTemplate to guide the LLM to summarize the text.

#define a recursive text spitter to chucnk the complaints
text_splitter = RecursiveCharacterTextSplitter(   
    chunk_size = 1000, #I set a to chunck size of 1000  
    chunk_overlap  = 40,
    length_function = len,
)


#set prompt template
prompt_template ="""
summarize the given text by high lighting most important information

{text}

Summary:
    """   

#define prompt template
prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

#define chain with a map_reduce type
chain = load_summarize_chain(llm, map_prompt=prompt, combine_prompt=prompt, verbose=True,chain_type="map_reduce")

Summarization can be done in an “online” (for one complaint at a time) or “batch” for a batch/chunk of complaints.

Online Summarization:

#for an online mode, just pass one complaint text

texts = text_splitter.create_documents([complaint_text])
summary = llm_chain.run(texts)
print(summary)

Here you can see an example of a splitted complaint description:

#example of output splitted texts
[
    Document(
        page_content='I am writing to formally complain about inaccurate and illegal reporting of transactions on 
my credit report, which I believe violates the Fair Credit Reporting Act ( FCRA ) specifically 15 U.S. Code 1681a.I
have carefully reviewed my credit report, and I have identified several inaccuracies in the reporting of late 
payments and utilization of credit. As per 15 U.S. Code 1681a, The term consumer reporting agency means any person 
which, for monetary fees, dues, or on a cooperative nonprofit basis, regularly engages in whole or in part in the 
practice of assembling or evaluating consumer credit information or other information on consumers for the purpose 
of furnishing consumer reports to third parties, and which uses any means or facility of interstate commerce for 
the purpose of preparing or furnishing consumer reports. \\n The term consumer means an individual. \\n The term 
consumer report means any written, oral, or other communication of \\nany information by a consumer reporting'
    ),
    Document(
        page_content="information by a consumer reporting agency bearing on a consumers credit worthiness, credit 
standing, credit capacity, character, general reputation, personal characteristics, or mode of living. * ( 2 ) 
Exclusions \\n ( A ) ( i ) report containing information solely as to transactions or experiences between the 
consumer and the person making the report ; \\n It is illegal to report inaccurate information that adversely 
affects a consumer 's creditworthiness. Below are the specific discrepancies that I have identified Account number 
Account type : Home Equity. Date opened Late payments recognized on of . Account number Account type : Credit card.
Date opened Late payments recognized on of Account number Account type : Auto Loan. Date opened Late payments 
recognized on of Account number Account type : Home Equity. Date opened 104 % credit utilization . Account number 
Account type : Credit card. Date opened 1 % utilization Account number Account type : Credit Card. Date opened 1 %"
    ),
    Document(
        page_content='type : Credit Card. Date opened 1 % Utilization.\\n I am formally requesting that you conduct
a thorough investigation into these matters, as required by the FCRA. I kindly request that you promptly correct 
the inaccurate information on my credit report by removing the incorrect late payments and adjusting the reported 
credit utilization. I understand that under 15 U.S. Code 1681i, you are required to conduct a reasonable 
investigation within 30 days of receiving a dispute. I urge you to adhere to this statutory requirement and provide
me with written notification of the results of your investigation. If the investigation confirms the inaccuracies, 
I request that you update my credit report accordingly and provide me with a revised copy. Additionally, I would 
appreciate it if you could provide me with information on the steps taken to prevent such errors in the future. If 
my concerns are not addressed within the stipulated time frame, I will have no choice but to escalate this matter'
    ),
    Document(
        page_content='no choice but to escalate this matter to the Consumer Financial Protection Bureau. Please 
treat this matter with the urgency it deserves.'
    )
]

and here a pretty good overall summary of important information in the complaint has been highlighted:

#Example of output summary using LaMini-Flan-T5

The person is expressing a complaint about inaccurate and illegal reporting of transactions on their credit report,
which violates the Fair Credit Reporting Act (FCRA) specifically 15 U.S. Code 1681a. The person identified several 
discrepancies in the reporting of late payments and utilization of credit, and is requesting a thorough 
investigation into the credit card utilization and the FCRA's requirement to conduct a reasonable investigation 
within 30 days of receiving a dispute. They request to update their credit report and provide information on steps 
to prevent future errors.

#Example of output summary using Vertext AI PaLM 2 LLM

The writer is writing to formally complain about inaccurate and illegal 
reporting of transactions on their credit report. The writer believes that 
this violates the Fair Credit Reporting Act (FCRA). The writer has carefully reviewed their credit report and has identified several inaccuracies in the reporting of late payments and utilization of credit. The writer is requesting that the consumer reporting agency correct the inaccuracies in their credit report.\n\nIt is illegal to report inaccurate information that adversely affects a consumer's creditworthiness.
The specific discrepancies are:\n\n- Account number Account type: Home Equity. Date opened Late payments recognized on of .\n- Account number Account type: Credit card

Batch Summarization:

For batch processing, “batch” execution mode of LangChain should be called:

def split_doc(text_splitter,doc):
    """
    function to split an input document using Langchain
    Args:
        text_splitter: a langchain text splitter
        doc: string text 
    Output:
        texts: a dictionary of splitted text
    """
    texts = text_splitter.create_documents([doc])
    
    return texts

def summarize_docs(docs,llm_chain):
    """
    function to summarize chunked documents
    Args:
        llm_chain: a langchain summarize chain
        docs: chunked documents
    Output:
        summaries: list of summarized documents
    """
    #summarize all chunks in one go
    summary = llm_chain.batch(docs)
    
    summaries=[]
    #extract summaries 
    for summarized_doc in summary:
        summaries.append(summarized_doc['output_text'])
    
    return summaries


#load complaints data
#read data from file storage 
df_complaints=read_data()

#set the complaint description column for summarizing
desc_col='Consumer complaint narrative'

#chunk all complaints
docs=df_complaints[desc_col].apply(lambda doc: split_doc(text_splitter,doc) )

#extract and concatenate summaries to the original data
df_complaints['summarized_narrative']=summarize_docs(docs,llm_chain)

Evaluation:

Human assessment plays a crucial role in evaluating summarization tasks. It is essential to thoroughly review the generated output to ensure it is concise and also maintains the core objectives of the original text.

To ensure that summaries are aligned closely with human perception, selected samples of summarized documents can be compared with human interpretations, and their ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score can be calculated. ROUGE comprises metrics that help effective evaluation of automatic text summarization and machine translations.

Final Note:

In this article, I’ve showcased the development of a scalable summarization solution. Both LaMini-Flan-T5 and Vertex AI PaLM 2 API (taking into account the associated costs) models along with LangChain exhibited strong performance in extracting important highlights from complaints, showcasing robust capabilities in generating AI-powered summaries.

In upcoming posts, I’ll employ BERTopic and LLM to identify the predominant trends in customer complaints and uncover the root causes behind these issues. This analysis aims to provide valuable insights for businesses.

From GenAI to Insights from Your Customers (Part 1)

Written by Tara Pourhabibi