Pioneering a New Era of Automated Customer Service with Large Language Models (LLMs)

Jochen Wulf
8 min readJul 27, 2023

--

by Jochen Wulf and Jürg Meierhofer (ZHAW)

Figure 1: Dall-E Visualization of a Customer Support Robot

Motivation

In the rapidly evolving digital age, the adoption of large language models (LLMs) such as openai´s GPT-4 will become a game-changer for customer services. These powerful tools have the potential to revolutionize the way businesses interact with their customers, offering enhanced efficiency, accuracy, and a higher level of satisfaction [1]. LLMs have the unique ability to understand and respond to complex customer queries, adapt to diverse customer preferences, and deliver personalized and contextually relevant support at a standardized quality level that is independent of an individual customer service agent’s knowledge base. Moreover, they offer a scalable solution that can handle high volumes of customer interactions, reducing the need for extensive human resources. By leveraging LLMs, businesses can not only improve their customer support but also realize significant cost savings, making them an invaluable asset in today’s competitive business environment [2].

On the technical front, LLMs are a type of artificial intelligence model trained on vast amounts of text data. They learn to predict the next word in a sentence, enabling them to generate human-like text. GPT (Generative Pretrained Transformer), a type of LLM, is particularly notable for its performance. GPT models use a transformer architecture, which allows them to understand the context of words in a sentence, leading to more accurate and coherent text generation.

While there is first evidence of the business impact of LLMs for customer support [3], their real potential to transform customer service remains largely unclear. In the following, we discuss five cognitive tasks in customer support that can be automated with LLMs and show real-world examples. Further, we discuss five core questions that firms need to answer when automating their technical customer support.

Five cognitive tasks in customer service supported by LLMs

From the academic literature on LLMs we extract five cognitive tasks supported by LLMs that are relevant for customer service [4]. These five tasks are depicted in Figure 2 ordered by their respective level of complexity.

Figure 2: Five cognitive tasks in customer service supported by LLMs

In the following, we demonstrate these five tasks based on a case example: a customer inquiry submitted to a peer-to-peer support community portal of a large telecommunications operator. This inquiry deals with the migration of Internet router configurations during an upgrade to a new router. It consists of a stream of 20 messages with several technical solution tips and customer’s feedback. The original customer question is shown below (Figure 3). We use openai chat with GPT-4 to demonstrate the automation scenarios. The text elements shown in the figures have been anonymized and disguised.

Figure 3: Customer support use case — router upgrade

Text correction and translation. LLMs can translate text between languages or language modes by learning the underlying patterns and structures of different languages from the training data. The translation process involves providing the model with a prompt in the source language and asking it to generate a response in the target language. The model’s ability to understand context and semantics allows it to produce high-quality translations.

The figure below (Figure 4) demonstrates how GPT-4 translates a rough email template in the prompt into a well formulated customer email.

Figure 4: Translation example

Text summarization. Modern LLMs base on the transformer architecture and make use of a so called attention mechanism [5]. The attention mechanism also allows the model to take into account the context in which each word appears. This means that the model can understand the relationships between different parts of the text, and can generate a summary that accurately reflects the overall meaning of the original text.

Text summarization will become an important tool for 1st level personnel to efficiently extract important information from prior customer communication or from incident records. Below is an example of a summary of the lengthy customer communication regarding the router upgrade (Figure 5).

Figure 5: Summarization example

Content generation. LLMs can generate a wide range of content, from emails and social media posts to blog articles and stories. They can be given a prompt or a starting point, and they generate the rest of the content based on the patterns they’ve learned. This capability is useful in a variety of applications, including content marketing, creative writing, and more.

The example below (Figure 6) demonstrates how GPT-4 can generate a customer email based on a prior flow of messages regarding a customer inquiry.

Figure 6: Example content generation

Question Answering. In question answering the LLM either searches and uses the internal factual knowledge provided in the pre-training corpus or the external contextual data provided in the prompt to generate commonsense answers to questions or instructions. In contrast to more complex reasoning tasks, question answering is limited to retrieving required and preexisting information from a large data set.

The following example demonstrates the ability of GPT-4 to retrieve the solution for a customer problem from a larger problem-solution dataset (Figure 7). In this example, we handed over six different customer inquiry message flows, from which only one was relevant for the task at hand.

Figure 7: Example question answering

Reasoning. Complex reasoning, unlike commonsense question answering, necessitates the comprehension and application of evidence and logic to reach conclusions. Typically, it involves a sequential reasoning process grounded in factual knowledge, culminating in the answer to a posed question.

In the following, we directly input a customer email into the prompt and ask for possible solutions to the customer problem (Figure 8). Please note that we only include the flow of messages belonging to this specific customer problem in the prompt as contextual data. Reasoning here involves isolating the specific customer problem and producing solutions from the contextual data. GPT-4 generates correct and specific solution instructions in this case.

Figure 8: Example reasoning (with focused contextual data input)

In a second example (Figure 9) we used the same prompt, however, we handed over six different customer inquiry message flows, from which only one was relevant for the customer problem. This more closely resembles real-life scenarios, in which solutions must be retrieved from a large problem-solution database. This time, GPT-4 returned unspecific solution proposals, which will not solve the customer problem. GPT-4 was unable to isolate the correct problem and retrieve the specific solution from the broader dataset.

Figure 9: Example reasoning (with broad contextual data input)

In the next section we come back to this challenge and discuss approaches to improve complex reasoning in customer support scenarios.

Challenges and Open Questions

Will ready-made software or cloud solutions for customer support be available soon? Enterprise software providers are currently investing heavily into LLMs. Microsoft, for example, integrates functionalities such as translation, summarization and content generation into the office products [6] and offers question answering with company internal data [7]. Google follows a similar approach [8]. Service desk platform providers such as Atlassian [9], Zendesk [10], Salesforce [11], and BMC [12] implement comparable functionalities into their products.

Should firms invest into research and development regarding their technical customer support? If you target a more complex automation scenario that involves reasoning, such as the (semi-) automated generation of solution proposals to customer problems, standard software will likely not be enough. The reasoning example in Figure 9 shows that LLMs tend to hallucinate, i.e., they generate untruthful information. More advanced technological approaches such as Chain-of-Thought Prompting, text retrievers or finetuning will be required and need to be tailored to the specific application domain and data.

Is there a business case for advanced automation in technical customer support? Although the development and operating cost for dedicated LLM-solutions are substantial, the application of LLM will likely produce a positive ROI. LLMs are predicted to contribute very significantly to worker productivity. Gartner [13], for example, calculates with productivity increases of customer support staff of between 14% and 100% and a positive return on variable cost of 74% in a pessimistic scenario. However, considering the substantial technology investments, business cases need to be evaluated carefully.

How tour evaluate data security and legal risks? Despite the large media coverage, data security risks related to LLMs are very manageable. The presence of established enterprise cloud providers such as Microsoft or Google in the LLM market will substantially reduce security related implementation efforts. When it comes to legal risks, the LLMs´ compliance with copyright law is strongly debated [14]. Most likely, however, the providers of LLMs will bear full legal responsibility.

This article discusses the company-internal use of LLMs, but what about customer-facing chatbots? As previously discussed, LLMs carry the risk of producing hallucinations, the avoidance of which is still an open research topic. We therefore advice to start customer support automation with so-called “human-in-the-loop” systems, in which a customer support agent remains in full control of customer communication. When it comes to customer communication with low complexity and reliability requirements, such as product information search on websites, LLM-based chatbots are quickly becoming state of the art.

References

[1] Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. 2023. “Generative AI at Work.” National Bureau of Economic Research.

[2] https://www.gartner.com/en/documents/4527899

[3] Brynjolfsson, Erik, Danielle Li, and Lindsey R. Raymond. 2023. “Generative AI at Work.” National Bureau of Economic Research.

[4] Liu, Yiheng, Tianle Han, Siyuan Ma, Jiayue Zhang, Yuanyuan Yang, Jiaming Tian, Hao He, et al. 2023. “Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models.” arXiv. http://arxiv.org/abs/2304.01852.; Zhao, Wayne Xin, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, and Zican Dong. 2023. “A Survey of Large Language Models.” ArXiv Preprint ArXiv:2303.18223.

[5] Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \Lukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.

[6] https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/

[7] https://learn.microsoft.com/en-us/azure/ai-services/openai/use-your-data-quickstart

[8] https://cloud.google.com/blog/transform/generative-ai-industry-applications-google-io-announcements

[9] https://community.atlassian.com/t5/Jira-Service-Management-articles/Atlassian-Percept-AI/ba-p/1952471

[10] https://developer.zendesk.com/documentation/apps/build-an-app/using-ai-to-summarize-conversations-in-a-support-app/

[11] https://www.salesforce.com/products/ai-for-customer-service/

[12] https://www.bmc.com/blogs/large-language-models-in-service-management/

[13] https://www.gartner.com/en/documents/4527899

[14] https://crfm.stanford.edu/2023/06/15/eu-ai-act.html

--

--

Jochen Wulf

Jochen Wulf is senior lecturer for Data Driven Service Engineering at Zurich University of Applied Sciences (ZHAW)