Cross-lingual/Cross-channel Intent Detection in Contact-Center Conversations

Suraj Agrawal
The Observe.AI Tech Blog
5 min readAug 7, 2023

This blog post is dedicated to our technical paper titled “Cross-lingual/Cross-channel Intent Detection in Contact-Center Conversations,” which has been successfully accepted at Interspeech 2023.

In the world of customer service, contact centers play an important role as an interface for customers to raise concerns, clarify doubts, or getting any other information. Businesses analyze transcripts from contact centers to understand customer issues, agent performance, and market trends, which often necessitate the identification of pertinent intents or events of business interest. To efficiently identify such intents/events at scale, contact centers typically employ automated keyword spotting, where intents are characterized by a collection of key phrases and tracked throughout conversations.

However, the manual task of determining the most suitable key phrases for accurate intent identification in specific business contexts can be laborious and challenging due to the noisy nature of calls, and ASR errors. Adding to this pain is the lack of visibility into the relevant phrases used by agents and customers across interactions that help identify these intents with good accuracy and coverage. The process has to be repeated for different languages, requiring linguistic expertise since a direct translation of key phrases does not often yield optimal results. Moreover, the effectiveness of the same phrases may vary across different communication channels, such as calls, chat, or email, due to inherent differences in phrase usage for ex- pressing identical intents. For instance, ”Thank you for calling” is a common phrase in call interactions, but the term ”calling” does not appear in chat or email exchanges. This implies that an intent definition created for “proper call closing by agent” may not work effectively for the same intent in chat and email. Consequently, contact centers must invest resources in developing intent definitions tailored for each supported language and communication channel.

We introduce a system aimed at supporting human efforts in defining intent key phrases by suggesting phrases that are assured to be present in conversations in the target language/channel. Specifically, we present a Semantic Phrase Search Engine that enables cross-lingual/cross-channel phrase discovery to support the following applications:

  • Intent Phrase Expansion, to improve coverage of intent detection by automatically expanding the phrases list with semantically similar phrases that improve accuracy and coverage.
  • Cross-lingual/cross-channel translation of intent definition, to enable the in-context translation of a set of intent key phrases to the target language and/or channel while maintaining semantic integrity.

The system consists of three key components. (1) Key Phrase Mining, (2) Intent Phrase Expansion and (3) Cross-lingual/cross-channel translation. We will discuss each of these components in detail in the subsequent sections.

1. Key Phrase Mining

This step Identifies and stores candidate key phrases for recommendations. Identifying phrases that are relevant for defining indents and filtering out non-relevant phrases is critical. For instance, many of the phrases from interactions like “aren’t we”, “yes sir, sure” can’t be recommended to the users, as they don’t help identify any intents. Also, there could be minor variations of the same phrase, like “can help you”, “we help you”, “me help you”, etc., which could add redundancy in the recommendation rather than diversity.

Key Phrase Mining step ensures that only the phrases which are most relevant in identifying intents in interactions between contact center agents and customer are extracted.

Key Phrase Mining

The above figure shows a bird’s eye view of “Key Phrase Mining”, which involves the below components:

Pre-processor: The pre-processor accepts input text (transcripts in the case of voice calls) and performs a series of language and channel-specific transformations that includes steps like normalization, tokenization, and POS tagging.

Phrase Extractor: This module extracts relevant key-phrases from pre-processed tokens by processing them through a multi-step pipeline that includes collocation extraction, noise filtering, and collocations filtering based on a measure of statistical significance.

Diversity Inducer: Many phrases extracted in the previous step have significant overlap between them. For example. “speak to manager”, “i speak to manager” have three words in common. Although these are separate key-phrases, such phrases often convey the same meaning. To prevent redundant phrase recommendations, Diversity Inducer iteratively merges such key-phrases based on some heuristics.

Embedding Network: Each extracted key phrase is then encoded into a vector. We use Google’s Multilingual Universal Sentence Encoder as our embedding network, primarily due to it’s support for out-of-vocab words and multilingual support.

Phrase-meaning Database: Finally, pairs of key-phrases and their vectors are stored in a database.

2. Intent Phrase Expansion

For improving coverage of intent definitions, we propose an Intent Phrase Expansion algorithm that employs nearest-vector-search based retrieval from the key-phrases database. Given a seed phrase, this component recommends relevant semantically similar phrases that helps improve accuracy and coverage of intent detection.

Intent Phrase expansion

The above figure shows the components in the Intent Phrase Expansion pipeline.

3. Cross-lingual/cross-channel Translation

For enabling translation of an intent defined in one language/channel to another language/channel, we leverage the database of key-phrase vectors generated by the multilingual encoder for different languages/channels through the following steps:

  • First, we run the key phrase mining algorithm to extract the relevant key phrases from the interactions present in the target language.
  • We generate a list of candidate phrases in the target language/channel by retrieving the top-k semantically similar phrases for each key-phrase in the source intent definition, using the Intent Phrase Expansion component.
  • Finally, We then merge and rank the candidate phrase list using a weighted scoring algorithm, based on similarity scores between phrases from source intent definition and retrieved candidate phrases. Top n candidate phrases from this ranked list are recommended as the intent definition in the target language/channel.

Data Liveliness

The key-phrase mining pipeline is run at regular intervals of time to ensure that the recommendations adapt to changes/ trends in the nature of conversations due to internal or external factors. This was found to be effective, for instance, when the phrases relevant to detect “proper call closing by agent” started including new key phrases like “take care” and “stay safe” with the onset of COVID-19 pandemic.

Key Takeaways

  • We propose a Semantic Phrase Search Engine that supports cross-lingual/cross-channel phrase discovery.
  • The proposed system enables Intent Phrase Expansion, to improve coverage of intent detection by expanding the phrases list, and
    cross-lingual/cross-channel translation of intent definitions while maintaining semantic integrity.
  • The system helps improve accuracy and coverage of intent detection by enabling more effective intent definitions, while significantly reducing manual effort.

This work has been accomplished through a collaborative effort involving Aashraya Sachdeva, Soumya Jain, Cijo George, Jithendra Vepa

Learn more about how we’re changing conversation intelligence for contact centers around the world at Observe.AI.

--

--