Understanding LLM Agents: A Practical Guide to Building AI Solutions for Contact Center Optimization

shikhar kanaskar
Shikhar’s Data Science Projects

--

A few months back I realized that contact centers worldwide struggle with accurately categorizing calls and understanding the specific reasons behind call volumes. As a data science consultant, this thought sparked an idea — could we build a solution to tackle this issue?

This was more than a year ago when LLM was just starting to gain momentum. My initial thought was to train an NLP model using manually labeled data — a traditional machine learning approach — because I doubted whether LLMs could handle such a nuanced use case. Second option — I considered building a topic modeling model, which now seems extremely less functional compared to the capabilities of LLM Agents today. Then another idea struck me: what if we could fine-tune an open-source, smaller model like the initial versions of Llama or Alpaca to categorize contact center calls and identify themes? However, none of these methods have quick turnarounds and less effort. During my research I understood the concept of LLM Agents and how they can be built using a base-LLM by adding state and tools to create a fully functional solution without the need of fine tuning them. With this solution as a starting point, I have built several AI Agents (some include RAG, etc.) to solve problems for my organization.

In this article, I delve into how I’ve been professionally building LLM Agents within my organization to solve this and many other problems in past one year. Working closely with our contact center team, we’ve harnessed these LLM agents to analyze call recordings, enabling us to make data-driven decisions like never before. I’ll share my learning and experience on LLM Agents and demonstrate how they can automate processes and unlock innovative solution capabilities in organizations.

While my experience primarily revolves around building agents and LLM pipelines in Azure AI Studio, the core principles of any agentic flow remain consistent across platforms. I have recently started learning more about AWS Bedrock through Andrew Ng’s course on building LLM Apps on. Both these platforms provide amazing low-code capabilities, and help in building LLM Agents and evaluation flows. Given below is my experimentation with AWS Bedrock, utilizing Amazon’s Titan LLM to construct this ‘agentic flow’. As I embark on this new journey with AWS Bedrock, I’ll highlight my experience with prompt engineering and grounding techniques to enhance accuracy and reduce hallucinations.

Understanding LLM Agents and Agentic Workflows

What Are LLM Agents?

LLM Agents are advanced AI systems powered by Large Language Models, such as OpenAI’s GPT-4 or Amazon’s Titan. To understand this better, in the table below I have done an over-simplified metaphorical comparison of an AI Agent with human body. Just like heart is the powerhouse of the body, Core LLMs like GPT and Titan create the foundations, whereas other functionalities make the powerhouse more suitable for the job. Unlike basic LLMs that operate in a passive, single-turn interaction mode, LLM Agents are designed to interact dynamically with their environment, perform tasks autonomously, bring some other knowledge which is not available with basic LLM such as company retrieved documents (using RAG), and engage in multi-turn conversations.

Disclaimer: This analogy is for illustrative purposes only and does not capture all LLM complexities.

Agentic Workflows Explained

Agentic workflows refer to the processes where AI agents perform tasks independently, making decisions based on the data they receive and the goals they are programmed to achieve. In the context of contact centers, agentic workflows enable AI agents to handle customer interactions, extract insights from conversations, and provide actionable outputs without constant human intervention.

The Evolution from Basic LLMs to Expert Agents

Differences Between Basic LLMs and LLM Agents

While basic LLMs can generate human-like text based on input prompts, they lack the ability to perform tasks autonomously or interact with external systems. LLM Agents, on the other hand, are designed to:

  • Understand Context: Maintain state over multi-turn conversations.
  • Interact with Systems: Execute actions, access databases, or call APIs.
  • Learn and Adapt: Improve performance over time through feedback loops.

How I understand LLM Agents as : LLM Agent =

A. LLM: The core model that understands and generates human-like text based on input prompts.
B. STATE: Mechanisms to maintain context over multiple interactions, enabling coherent and context-aware conversations.
C. TOOLS: Interfaces to interact with external systems like APIs and databases, allowing the agent to execute actions and access up-to-date information.

The Shift Towards Expertise

LLM Agents represent a shift towards more expert systems that can handle complex tasks. They are not just passive responders but active participants in workflows, capable of understanding nuanced customer interactions and providing tailored solutions.

Building an Agentic Workflow with AWS Bedrock

In this section, we’ll explore how to create an agentic workflow using AWS Bedrock to improve contact center capabilities. We’ll use open-source audio recordings, transcribe them, and generate summarized insights using Amazon’s Titan LLM. The agent will take autonomous decisions to create service tickets with topic and summaries. This is a very simplistic proof of concept code, which can be further customized (example- add grounding and few shots in template, etc.) to further improve the model.

Step-by-Step Implementation (Proof of concept)

This is just illustrative code and does not include many other functions and classes. I have the github repo pinned with this article… 😅

1. Storing Audio Recordings in Amazon S3

I downloaded some mp3 recordings from a few websites, example of those recordings are as follows:

I begin by uploading audio recordings to an Amazon S3 bucket. These recordings serve as the raw data for our workflow.

import os
from helpers.s3_helper import S3_Helper

bucket_name = 'audiorecordingsllm'
data_folder = 'data'
s3_helper = S3_Helper(s3_client)

for filename in os.listdir(data_folder): # Upload audio files to S3
if filename.endswith('.mp3'):
audio_path = os.path.join(data_folder, filename)
s3_helper.upload_file(bucket_name, audio_path, filename)

Explanation: I iterate over audio files in the data folder and upload them to our S3 bucket using a helper class.

2. Transcribing Audio Using Amazon Transcribe

Next, I transcribe the audio files using Amazon Transcribe, which converts speech to text and supports speaker identification.

from helpers.transcribe_helper import Transcribe_Helper

transcribe_helper = Transcribe_Helper(transcribe_client)
transcript_job_name = 'transcription-job-' + str(uuid.uuid4())
transcript = transcribe_helper.transcribe_audio(
transcript_job_name, bucket_name, filename
)

Explanation: I create a unique transcription job name and start the transcription process. The helper class handles interaction with the Transcribe service.

3. Formatting Transcripts

Once I receive the transcripts, I format them to identify speaker labels and content, preparing the data for summarization.

def extract_transcript_from_text(transcript_json):
output_text = ""
current_speaker = None
items = transcript_json['results']['items']
for item in items:
speaker_label = item.get('speaker_label')
content = item['alternatives'][0]['content']
if speaker_label and speaker_label != current_speaker:
current_speaker = speaker_label
output_text += f"\n{current_speaker}: "
output_text += f"{content} "
return output_text

Explanation: This function processes the JSON output from Amazon Transcribe and structures the transcript with speaker labels.

4. Crafting the Prompt with Jinja2 Templates

Prompt engineering is crucial for guiding the LLM to produce the desired output. I used Jinja2 templates to create dynamic prompts.

from jinja2 import Environment, FileSystemLoader

env = Environment(loader=FileSystemLoader('templates'))
template = env.get_template("prompt_template.txt")

data = {
'transcript': formatted_transcript,
'topics': ['charges', 'location', 'availability']
}
prompt = template.render(data)

Explanation: I load a prompt template and render it with the transcript and specific topics we want the LLM to focus on.

Prompt Template (prompt_template.txt):

I need to summarize a conversation. The transcript of the conversation is between the <data> XML like tags.

<data>
{{transcript}}
</data>

The summary must contain a one-word sentiment analysis and a list of issues or causes of friction during the conversation. The output must be provided in JSON format as shown in the following example.

Example output:
{
"version": 0.1,
"sentiment": <sentiment>,
"issues": [
{
"topic": <topic>,
"summary": <issue_summary>
}
]
}

An `issue_summary` must only be one of:
{%- for topic in topics %}
- `{{topic}}`
{% endfor %}

Write the JSON output and nothing more.

Explanation: The prompt instructs the LLM to produce a JSON output summarizing the conversation, focusing on specified topics and sentiment analysis.

5. Summarizing with Amazon’s Titan LLM

I pass the crafted prompt to Amazon’s Titan LLM through the Bedrock runtime client to generate the summary.

def bedrock_summarisation(prompt):
kwargs = {
"modelId": "amazon.titan-text-lite-v1",
"contentType": "application/json",
"accept": "*/*",
"body": json.dumps({
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 512,
"temperature": 0.8,
"topP": 0.9
}
})
}
response = bedrock_runtime_client.invoke_model(**kwargs)
response_body = json.loads(response.get('body').read())
return response_body['results'][0]['outputText']

Explanation: I configure the model parameters and invoke the model. The response is parsed to extract the summary.

6. Output and Interpretation

The final output is a JSON-formatted summary containing the sentiment and issues discussed in the conversation.

{
"version": 0.1,
"sentiment": "Negative",
"issues": [
{
"topic": "charges",
"summary": "The customer is disputing unexpected charges."
},
{
"topic": "availability",
"summary": "The service the customer needs is currently unavailable."
}
]
}

Explanation: This structured output can be used by the contact center to quickly understand customer sentiments and address issues efficiently.

Step 7: Parse the LLM Output and Extract Topics

import json

# Parse the LLM output
llm_output = bedrock_summarisation(prompt)
summary_data = json.loads(llm_output)
issues = summary_data.get('issues', [])

Explanation: Parse the JSON output from the LLM to extract the list of issues.

Step 8: Define Actionable Topics and Create Service Tickets

The code for this section is specific to organizations and ca be integrated with ServiceNow databases to create tickets autonomously.

actionable_topics = ['charges', 'availability', 'billing', 'technical_issue']

def create_service_ticket(topic, issue_summary):
ticket_id = str(uuid.uuid4())
print(f"Created ticket ID {ticket_id} for topic '{topic}': {issue_summary}")

Explanation: Specify topics that require action and define a function to create tickets.

Step 9: Process Issues and Take Action

for issue in issues: # Process each issue and create tickets if necessary
topic = issue.get('topic', '').lower()
summary = issue.get('summary', '')
if topic in actionable_topics:
create_service_ticket(topic, summary)

Explanation: For each extracted issue, create a service ticket if the topic is actionable.

Since this current prototype is basic and easy to implement, we have massive opportunities to customize and improve it by -

Incorporating Fine-Tuning: One can refine the LLMs by fine-tuning them on domain-specific data to enhance accuracy in categorizing and interpreting customer interactions.
Ensure Data Privacy and Compliance: One can implement robust data protection measures and adhere to compliance frameworks to safeguard sensitive customer information throughout the workflow.
Improve Scalability and Robustness: One can enhance the system’s scalability and robustness to efficiently handle high-volume call processing, including better error handling and fault tolerance.
Integrate Human Oversight: One can establish mechanisms for human review and validation of AI-generated outputs to maintain accuracy and reliability in the system’s decisions.

Impact: Enhancing Contact Center Efficiency

By automating the transcription, summarization and ticket creation of customer interactions, contact centers can:

  • Reduces manual ticketing time: As the agent has ability to manually identify the reason, open and assign tickets.
  • Reduce Number of Calls: Understand why customers are calling, quantify and take strategic decisions
  • Enhance Quality Assurance: Monitor interactions for compliance and performance.
  • Gain Insights: Analyze trends in customer feedback and sentiment.

Pipelines versus Agents — which one is more suitable?

As we explore LLM Agents, it’s worth questioning if they’re truly essential. While they offer advanced capabilities, they also bring complexity and unpredictability. Some experts suggest using simpler LLM Pipelines for better transparency and reliability. It’s important to avoid jumping on new tech trends without assessing actual needs. Ultimately, consider if LLM Agents are necessary for your specific case or if simpler solutions might work just as well.

My Two Cents:

Over the past few years, as we began working with GPT-3 and smaller open-source models like Llama, the industry was still learning how to implement these technologies effectively. When I started implementing this solution at my organization, my initial thought was to fine-tune an LLM because I believed a model like ChatGPT might not perform as needed or provide sufficient context. However, with the emergence of Retrieval-Augmented Generation (RAG), few-shot learning, and no-code LLM offerings, the landscape began to change.

I collaborated with the Azure OpenAI team and had multiple sessions to understand what Agentic Flow means and how we might not need to fine-tune LLMs unless absolutely necessary. Concepts like prompt flow, chain of thought, context learning, grounding, and hallucinations — all the buzzwords of AI — started to make sense as I worked on these projects. I have faced several challenges on this journey, and I plan to write more articles about each of them.

In conclusion, I would say we’re still at the beginning of the AI journey, and while these models are exciting, they sometimes fail or do not perform as we want them to. Nevertheless, they offer amazing workflows to solve significant challenges within organizations. They have provided valuable insights, increased net revenue, and helped make processes more lean and optimized.

I am continually learning and may have overlooked certain aspects or oversimplified some, but this is based on my current understanding. I welcome any suggestions and questions to further enhance this work.

--

--