The Dawn of the Conversational Interface

Michael McTear
Social Robots
Published in
4 min readJul 29, 2016
Travel chatbot helps you book a hotel using a conversational interface

Conversational interfaces are all the rage. Chris Messina of Uber has called 2016 “the year of conversational commerce” and the rush to develop the ultimate conversational interface has been embraced by major tech companies, including Apple, Amazon, Google, Facebook, Microsoft, and IBM. Moreover, according to analyst Will Knight, conversational interfaces are already the main mode of interaction with smartphones in China, given the difficulties of interacting with Chinese characters on a tiny touch screen.

Conversational interfaces also play a crucial role in the success of social robots. Unless the social robot is meant to mimic an animal or other creature that communicates without the use of language, people will expect to be able to converse in a free and natural manner with their robot companions.

But what exactly is a conversational interface?

There are two aspects to the use of the term conversational. On the one hand, conversational can refer to the style of language that we use to interact with a smartphone, smart device, or social robot. We should be able to speak or type in a natural and intuitive style, in contrast to the style of language required by earlier interfaces involving set commands or queries with a restricted vocabulary and syntax. Thus a conversational interface should allow the user to express the same message in a variety of different ways without having to worry whether they will be understood.

Conversational can also refer to the style of the interaction. Voice user interfaces (also known as IVRs or Interaction Voice Response systems) — used widely in contact centres as a cost-effective substitute for human agents — require a rigid mode of interaction in which the system controls the dialog and asks a series of questions to which the user has to provide short and highly constrained responses. This mode of interaction is known as system-initiative or system-directed dialog.

Another type of interaction involves one-shot queries where the user asks a question and the system provides an answer. Until recently it was not possible to ask a follow-up question and the system could not initiate a clarification sub-dialog if the meaning of the user’s question was unclear. Conversational systems aim to overcome these deficiencies by allowing the user to also take control of the dialog (known as mixed-initiative dialog) and to ask follow up questions for which the system has to keep track of what has been asked about so far. This is now being supported in systems such as Google Now. The following example, recorded on July 28, 2016, shows how Google Now can keep track of entities mentioned in the user’s queries so that they can be referred to in follow-up queries using pronouns :

User: Who is the President of the United States?

Google Now: Barack Obama is the President of the United States of America.

User: What is his wife’s name?

Google Now: His spouse is Michelle Obama since 1992.

User: How old is she?

Google Now: Fifty-two years old.

User: What does she do?

Google Now: Michelle Obama’s occupation is a lawyer and writer.

Is a conversational interface the same as an intelligent virtual assistant?

When reading about conversational interfaces, you will find a wide range of different terms in use: conversational agents, conversational bots, conversational interfaces, intelligent virtual assistants, digital personal assistants, and so on. Often these terms are used interchangeably. However, it is useful to distinguish between an intelligent virtual assistant (IVA) that interacts with a user, often to assist with some task, such as booking a flight, and the technologies that enable interaction with the IVA, i.e. its conversational interface.

What are the technologies that make up a conversational interface?

When we speak to a conversational interface, the following technologies are involved:

  • Automatic speech recognition (ASR) — recognition of the words spoken.
  • Spoken language understanding (SLU) — interpretation of the words.
  • Dialog management (DM) — formulation of a response, or if the message was unclear or incomplete, planning further interaction with the user to seek clarification and elicit the required information.
  • Response generation (RG) — construction of the response, either in the form of words or a visual display, or both.
  • Text-to-speech synthesis (TTS) — speak the response.

The first and final steps are omitted in text-only interfaces.

Social robots will need to be adept at all of these steps. But, in addition to this, they will also need to recognise and produce the social cues that accompany spoken and written language. When people engage in natural conversational interaction they convey much more than just the literal meanings of their words. Their speech (as well as their text messages) can convey their emotional state and aspects of their personality. Additionally, in face-to-face interaction nonverbal behaviours, such as facial expressions, gestures and body posture, also convey meaning. Social robots and other smart devices will also be able to use built-in sensors and actuators to gather data about the user and the environment, including location, motion, orientation, and biosignals such as heart rate. These additional inputs will be particularly important in conversational interfaces for social robots.

Conversational Interfaces — There’s Lots to Talk About

As can be seen, a conversational interface involves a complex set of technologies. If you are interested in finding out more, a new book that I co-authored — The Conversational Interface: Talking to Smart Devices — provides a comprehensive introduction along with practical examples and exercises using open-source tools.

--

--

Michael McTear
Social Robots

Reading and writing about conversational interfaces, exploring their application in areas such as healthcare and education.