Listening To Silences In Contact Center Conversations Using Textual Cues

Digvijay Ingle
The Observe.AI Tech Blog
5 min readAug 18, 2023

This blog refers to our technical paper accepted at Interspeech 2023, Dublin, Ireland

Photo by Lukas Blazek on Unsplash

Introduction

In the fast-paced world of contact centers, where effective communication defines exceptional customer experiences, the significance of silence often remains unnoticed amidst lively conversations. However, these moments of hush can significantly impact contact center performance, leading to increased average handling time (AHT) and potentially lower customer satisfaction. Understanding the characteristics of these silences becomes paramount for contact centers to thrive. In our paper, “Listening To Silences In Contact Center Conversations Using Textual Cues,” we emphasise the value of surrounding text in comprehending the characteristics of silences. By leveraging cutting-edge language models, we unveil how this textual context drives tangible improvements in reducing silences and enriching customer interactions. Our research redefines the approach to analyse these conversational phenomena, paving the way for a more seamless and engaging customer experience.

Motivation

To enhance operational metrics such as AHT and customer experience in contact centers, the focus has traditionally been on optimizing long and frequent silences. However, a deeper examination reveals that understanding silences goes beyond their mere duration. The way the agent-customer interaction unfolds around these silences can be crucial in determining their impact on customer experience. Certain silences, either unexpected (as seen in Examples 3 and 4 in Table 1) or prolonged expected silences, often lead to poor customer satisfaction. Thus, we believe that the contextual cues around silences contain vital information about their characteristics, making it more than just an audio problem.

To address this challenge, we approach silence understanding as two text classification tasks:

  1. Silence-Type Classification: By analyzing the conversation surrounding the silence, we aim to determine if the silence is expected or unexpected, crucial from a customer experience standpoint.
  2. Causer Identification: Since agents have limited control over silences caused by customers, identifying the causer of silence, be it the agent or the customer, empowers contact centers to focus their efforts on designing corrective actions particularly on agent-caused silences.

Methodology

Pre-Training

We pre-train our “silence-aware” language model, Silence-RoBERTa, by utilizing time-aligned ASR transcripts from a vast dataset of approximately 2.1 million English dyadic conversations between agents and customers in diverse industries like retail, finance, healthcare, and more. These conversations, collected from contact center calls, are carefully pre-processed at the turn level, with special tokens indicating the speaker and marking the end of each turn. Silent segments are encoded based on their duration into short, medium, and long bins. For pre-training, 15% of tokens are randomly replaced with [MASK] tokens (refer to Figure 1). We then train the Silence-RoBERTa model using the RoBERTa-base architecture leveraging its existing language model properties for better generalization.

Figure 1: Pre-Training Silence-RoBERTa Model

Fine-Tuning

For fine-tuning process, we sample silent segments from contact center conversations, annotated by a group of annotators for silence type (Expected or Unexpected) and causer (Agent or Customer) labels. We utilize the pre-trained Silence-RoBERTa model to fine-tune the two classification tasks and study two feature extraction setups: 1) Left Only and 2) Left + Right (refer to our paper for more details). We encode the contexts similar to pre-training stage and use the <s> token representations of the left and right contexts to fine-tune Silence-RoBERTa on each the two tasks.

Results and Comparison:

The Silence-RoBERTa model fine-tuned using above methodology is benchmarked against a mix of non text-based and text-based baselines. Following is the summary of our observations.

  1. Impact of textual cues: For both Silence-Type classification and Causer identification tasks, we observe that text-based models outperform non text-based models by margin of over ~18% macro F1, validating the importance of surrounding dialogue turns on understanding conversational silences.
  2. Impact of in-domain pre-training: We pre-train ConvRoBERTa, our in-house language model pre-trained using a similar approach as adopted for Silence-RoBERTa, except with an exclusion of silence-tokens and fine-tune it on the two tasks. We note a performance higher (> ~1.5%) than out-of-box fine-tuning of RoBERTa-base on both the tasks, signifying the importance of in-domain pre-training.
  3. Impact of silence-tokens: We observe upto 3% improvement in macro F1 using Silence-RoBERTa over ConvRoBERTa across the settings emphasizing the importance of using silence-tokens in pre-training stage to better capture the nuances of spoken conversation in contact centers.
  4. Real-Time Implications: Left Only setup outperforms Left + Right setup in Silence-Type classification, allowing it to naturally extend to real-time settings. Interestingly, adding right context marginally improves the performance on Causer Identification task, implying best performance in post-call analysis, however, with a slight trade-off it can also be applied in real-time settings.

Business Impact Assessment

We conduct a pilot study to assess the impact of the proposed framework in a real-world contact center setting. We integrate the Silence-Type classification and Causer Identification models with a CCaaS platform to trigger real-time alerts to contact center agents based on the detected silence characteristics. An illustrative example of end-to-end pipeline is shown in Figure 2.

Figure 2: End-to-end pipeline for realtime monitoring of conversational silences

Observations:

  1. Decreasing trend in the % Unexpected Silences, implying that exposure to these alerts drives better adherence to protocol by the agents while handling conversational silences
  2. Decreasing trend in % Expected Silence Violations exemplifying the effectiveness of the alert in reducing proportion of longer silences which could potentially lead to poor customer experience.
Figure 3: Month-over-month comparison of KPIs associated with conversational silences

Conclusions and Takeaways

  • Understanding the characteristics of conversational silences is not merely an audio phenomena and is often complemented by surround ing dialogue turns
  • Encoding silences in language models leads to silence-aware representations, that not only yields performance gains in silence tasks but also holds potential for enhancing other downstream tasks.
  • Our methodology is versatile, applicable in real-time and post-call analysis, benefitting customer experience and agent performance alike.

Learn more about how we’re changing conversation intelligence for contact centers around the world at Observe.AI.

--

--

Digvijay Ingle
The Observe.AI Tech Blog

Machine Learning Scientist | Applied Researcher | IIT Bombay Alumnus | Technical Writing