Building AI chatbots with active listening skills

Ziang Xiao
May 22, 2020 · 5 min read
Illustrative image: a person is talking and gesturing
Photo by Headway on Unsplash

During the past couple of years, AI chatbots have been used as an interviewer to engage users in a text-based conversation and draw out user views and opinions. Such chatbots can be used in various interview tasks, such as eliciting consumer opinions, interviewing job candidates, and collecting patient information for doctors.

Just like a human interviewer, ideally an interview chatbot should be able to ask open-ended questions to elicit rich responses from interviewees and follow up to such responses properly. However, it is quite challenging to build an effective chatbot that can elicit high-quality responses and deliver engaging user experience.

There are three major obstacles. First, an interview chatbot needs to effectively grasp and respond to users’ highly diverse and often complex natural language input to open-ended interview questions.

For example, when a chatbot asks, “What’s the top challenge you are facing?”, one user answered:

The biggest challenge I’ve faced is finding a sense of purpose. Being around like-minded individuals who are constantly wanting more out of life through countless jobs I’ve never found something I was proud of…”

Another user answered with a completely different input:

With a new baby I have a lot of additional expenses. So I have to try to obtain additional income. I try to earn extra income by working on mturk, but the pay is low and I don’t like the additional time taken away from my…”

Second, an interview chatbot needs to effectively handle complex conversation situations to ensure the completion of an interview. As shown below, users might ask clarification questions or simply do not know how to answer a question. Users may not be cooperative and provide gibberish and meaningless responses.

Fig 1. An effective chatbot needs to manage various user requests and uncooperative users

Third, it is difficult for chatbot designers or developers to take advantage of AI advances due to a lack of AI expertise or resources. It requires AI expertise to create interview chatbots that can handle the two challenges mentioned above. However, not every chatbot designer or developer has such expertise. Additionally, it is difficult to obtain massive amounts of training data to build machine learning models and train an interview chatbot.

For example, Facebook trained its lastest AI chatbot, Blender, with over 1.5 billion conversation examples. Similarly, Google’s Meena used a dataset containing 341GB of text (40B words) to achieve good results.

Can we build effective interview chatbots with imperfect AI? To answer this question, we have investigated the use of practical AI technologies to build effective interview chatbots. Specifically, we wish to power interview chatbots with active listening skills.

Active listening skills, such as paraphrasing, verbalizing emotions, summarizing, and encouraging, are often used by experienced human interviewers to make an interview more engaging and effective. We believe such interview skills can also help a chatbot to deliver an engaging and effective interview. The screenshot below shows an interview chatbot with active listening skills.

A sample chat between a user and an interview chatbot
A sample chat between a user and an interview chatbot
Fig 2. A screenshot of an example interview conducted by a chatbot (AI Minion) and a user (Sara).

To teach chatbots active listening skills, we have extended the Juji’s base system, a rule-based system, with a data-driven extension. Since the Juji base system can handle complex conversation flows automatically, our extension aims at learning how to interpret diverse user input and generate proper responses using one or more active listening techniques.

As shown in the image below, our data-driven extension includes three components with the goal of interpreting the semantic gist of a user input. The first component is to prepare training data. Specifically, we use several methods to reduce the manual data annotation effort as well as create a high-quality dataset.

This is an architecture diagram of our system, including several main sections.
This is an architecture diagram of our system, including several main sections.
Fig 3. Overview of our prototype system for building an effective interview chatbot.

We first use machine learning to automatically identify the intents conveyed by training data. We chose to use topic modeling to extract hidden intents. It is an unsupervised machine learning technique that requires no training. Since the extracted intents are typically summarized by a set of keywords, it is difficult to label the intents by the associated keywords due to missing context. We thus rank user responses by their semantic proximity to an intent. The training dataset is then created by including high-ranked sentences as positive examples and low-ranked sentences as negative examples.

The training data is then encoded to train text classification models that can identify the semantic gist of new user input.

We incorporate the trained classification models back into the Juji base system. The rules will be triggered by the prediction results at run time to guide the generation of proper system responses, enabling active listening.

To validate the usefulness of our method, we have conducted many experiments. For example, one of our experiments was to compare a baseline chatbot without active listening skills with a chatbot with active listening skills.

Each chatbot interviewed about 100 users recruited on Amazon Mechanical Turk. We evaluated each chatbot on two main dimensions: quality of interview response and satisfaction of user engagement.

The results showed that the chatbot with active listening skills outperformed the baseline chatbot on both dimensions. The participants found that the chatbot with active listening skills more engaging and showed a higher interest for future interaction with the chatbot. This chatbot also elicited higher-quality interviewee responses in terms of information volume and richness.

In summary, our work demonstrates the following:

  1. Practical approaches to effective interview chatbots. Our work presents practical implementations to power chatbots with a specific set of active listening skills.
  2. A hybrid framework for developing progressive chatbot platforms. Our work demonstrates a hybrid chatbot design framework, where rules can be used to bootstrap a chatbot and data-driven models can then be used to improve the chatbot’s conversation capabilities.
  3. Design implications for building empathetic chatbots beyond interview tasks. Since active listening aids effective communications beyond interviews, our work presents design considerations of building empathetic chatbots for a wide variety of applications, including counseling and training, beyond interview tasks.

For more details on powering chatbots with active listening skills, please check out our full paper published in the 2020 ACM CHI Conference on Human Factors in Computing Systems.