CauSE: Causal Search Engine for Understanding Contact-Center Conversations

Published in

The Observe.AI Tech Blog

6 min readAug 18, 2023

Presenting a delectable overview of our research paper, which has been graciously accepted at Interspeech 2023 titled- “CauSE: Causal Search Engine for Understanding Contact-Center Conversations”

We’ve all been there — those seemingly endless and infuriating calls with customer care representatives that leave us feeling frustrated and dissatisfied. It’s a common sentiment shared by customers worldwide, and the resounding demand is clear: “We need a better experience.”

The significance of a positive customer experience cannot be overstated. A happy customer is not only likely to return but also to become a loyal advocate, making repeat purchases and spreading positive word-of-mouth.

Now, picture yourself as the head of contact center operations at a company. Undoubtedly, you’d be driven to enhance the experience you offer to your valuable customers. Yet, before you can make improvements, you need to identify the specific challenges your customers are facing.

The sheer scale of tens of thousands of daily calls poses a daunting question: How can you possibly unearth the underlying problems at such a magnitude? We’ve developed an unparalleled solution that harnesses the power of data to provide crucial insights at scale, uncovering hidden opportunities for enhancing customer care.

Peekaboo: Unmasking the Wizardry of Our System

Our system simply takes in an intent query and generates a response consisting of a set of rationale topics. Take, for instance, the quest to understand why customers escalate issues to managers, users can pass the following as query:

intent_query = ['connect me to your manager', 'talk to your supervisor',
 'let me talk with your senior']

Our Training Pipeline has 4 core components:

Data Ingestion and Query Filtering
Discourse Segmentation
Topic Modeling
Topic Ranking and Topic Description

Lets jump right into it!

Data Ingestion and Query Filtering

We ingest text data into the system from 3 channels- emails, chats and transcripts of voice calls. On the basis of the intents configured in the query, we leverage an internal intent detection tool to filter the datapoints where the intents are present as Relevant Conversational transcripts and datapoints where intents are not present as Irrelevant Conversational transcripts. We exclusively ingest the text in the neighbourhood of the conversation segment where the query intent is found.

Discourse Segmentation

The filtered data on average contains hundreds of tokens. These conversations, rich in information, usually encompass multiple themes within a single conversation segment. To make sense of this data and extract valuable insights efficiently, we’ve harnessed the power of a neural segmentation model. Our approach involves breaking down longer text into smaller, manageable units called elementary discourse units (EDUs). These units hold independent meaning and, importantly, tend to focus on just one theme or topic of discussion. This step allows us to dissect the interactions which also improves the performance of the downstream clustering task. Here is a sample input and output for the Discourse Segmentation step:

Input: 'my name is Jane, registered phone number is ********,
 I am calling in to cancel the subscription'
Output: ['my name is Jane', 'registered phone number is ********',
'I am calling in to cancel the subscription']

Topic Modeling

In our pursuit of organizing vast conversational data, we turn to the BERTopic Pipeline — a powerful tool that helps us cluster elementary discourse units(EDUs) effectively. We leverage an in-house custom trained Sentence Transformer model to obtain embeddings of EDUs. We employ a dimensionality reduction algorithm to decrease the embedding size, followed by a clustering methodology to group topics within the BERTopic Pipeline. This combination allows us to streamline the clustering process, making it both accurate and efficient.

What sets our approach apart is the way we represent each cluster. Unlike traditional methods that rely on n-grams, we adopt a novel approach. We extract N diverse discourse units from each cluster via MMR methodology and utilize these as representatives of the corresponding topic. By focusing on diverse discourse units, we get a more holistic view of each topic, enabling us to draw richer insights and make better-informed decisions.

Example of few topic representations-

Topic_23: ['can you help me with your SSN', 'last 4 digits of your SSN',
'verify your social security number', 'provide your SSN',
'repeat your social security number', 'can your confirm your SSN please',
'let me verify the SSN', 'last four of your SSN',....]

Topic Ranking and Topic Description

We input over 20,000 contact center conversations for training the system, resulting in the generation of hundreds of thousands of EDUs. However, this abundance of data comes with a challenge — an explosion of topics, surpassing 100 in number. Unfortunately, the topics presented at the top by the BERTopic Pipeline are often generic and unrelated to the specific business queries at hand, such as agent greetings and identity verification. Bummer! This situation calls for a remedy aka Ranking Logic.

To achieve this, we introduce two sets of topics: the Base Topics (B) derived from relevant transcripts and the Reference Topics (R) obtained from the irrelevant transcripts, as shown in the figure above. Our hypothesis is that topics genuinely relevant to the business query will be exclusively present in the Base Topics, while generic topics will be found across both sets.

We derive topic embeddings for each cluster and compute a relevance score by ingesting representational EDUs via a proprietary ranking algorithm. We sort the base topics in descending order to rank them. Our ranking logic sifts through the multitude of topics and prioritize the ones that truly matter. Say goodbye to mundane topics that clutter the top positions, and welcome a concise and relevant list of topics that align with your business objectives.

We hit the Cerebras-GPT model with the cluster EDUs to generate topic description for each topic.

Example of top topics:

query = ['connect me to your manager', 'talk to your supervisor',
 'let me talk with your senior']

CauSE output: 
1. It seems that the customers are expressing concerns and frustration about emails, such as not receiving them,
needing them urgently, and requesting assistance in sending or receiving them.

2. It appears that the customers are complaining about being charged multiple times for the same thing, or being
charged for things that they shouldn’t be charged for.

3. The customers seem to be discussing issues related to their card information, such as providing or confirming
the correct card number, billing, and usage. Some customers express concerns about the security of their card
information or previous issues they have experienced.

4. The customers appear to be expressing frustration and dissatisfaction with the customer service they have received.

5. The customers seem to be discussing issues related to owing money, such as disputing the amount owed or claiming
to have paid their balance in full. Some customers are also suggesting that the company owes them money or that
there has been an error in the billing.

Final Thoughts

In this work, we have proposed a tool that can empower contact center supervisors to delve deeply into any conversation or event of business interest. By using the system, supervisors can gain valuable insights and a comprehensive understanding of customer interactions, enabling them to make well-informed decisions.
The insights obtained from this system can serve as a valuable feedback mechanism to improve products and services. By analyzing customer interactions, supervisors can identify pain points, customer preferences, and areas that require improvement. This valuable feedback loop can help businesses enhance their offerings, optimize processes, and cater to customer needs effectively.
Moreover, the system can play a crucial role in agent coaching and training. By analyzing conversations, supervisors can pinpoint areas where agents excel and areas where they need improvement. This allows for targeted coaching sessions, where agents receive constructive feedback and guidance to enhance their performance. As a result, agents can continually improve their communication skills, problem-solving abilities, and overall customer service, leading to better customer experiences.
Overall, the proposed system has the potential to revolutionize how contact centers operate, unlocking the potential of vast conversational data and turning it into actionable insights. It empowers businesses to adapt, evolve, and provide exceptional customer experiences, ultimately driving success and growth in the competitive market.

This work has been accomplished through a collective effort involving Tanay Narshana, Aashraya Sachdeva, Cijo George and Jithendra Vepa

Learn more about how we’re changing conversation intelligence for contact centers around the world at Observe.AI

CauSE: Causal Search Engine for Understanding Contact-Center Conversations

Written by Anup Pattnaik