The many terms of conversational computing
2016 brought in a whole new era of conversational technologies:
- At Microsoft’s BUILD 2016 conference, CEO Satya Naella said, “as an industry, we are on the cusp of a new frontier that pairs the power of natural human language with advanced machine intelligence.”
- At Facebook’s 2016 developer conference, Mark Zuckerberg presented their new bot platform and API for developing messenger chatbot applications.
- At Google I/O 2016, they presented the next generation of their conversational-UI, Google Assistant, as well as Google Home, their competitor to Amazon Alexa.
- Over five months, from Jan to June 2016, Amazon Alexa went from 135 available Skills to well over 1000.
While the promise of conversational computing is exciting, it also presents designers and developers with a whole host of new technology and terminology to learn. Here I’ve compiled a short list of terms commonly found in articles on conversational-UI development, and I’ve listed a few of my favorite resources to help you get started.
Conversational-UI Terms
Natural Language Understanding
Natural Language Understanding (or NLU) is the area of software development that covers a computer’s ability to understand human language. NLU covers everything from a computer’s ability to “hear” and identify human language, to understanding that language, to finally producing a coherent response. NLU technology is still young and requires significant amounts of data to be effective.
User Utterance
A user utterance is a single statement issued by a user. Some examples of user utterances are: “Alexa, what is the weather?”, “Hey Siri, set an alarm for 5am.”, and “Chef, how do I make macaroni and cheese?”. In each of these examples the term “user utterance” refers to the actual text of the statement.
Intent
An intent is the desired action a user wants to take, expressed in a given user utterance. For the user utterance “Hey Siri, set an alarm for 5am” the intent is to set an alarm. Intents are a core building block in the software development of voice based apps, and often have technical names such as: “setAlarm”, “searchPublicationByPublisher”, and “signIn”.
Slots (aka. Entities)
A slot (also referred to as an entity) is a specific piece of information required to complete an intent. In the user utterance “Hey Siri, set an alarm for 5am” the slot is “5am”. Slots are like variables in programming, and they are a core building block in the software development of voice based apps. Like variables, they generally have generic names such as: “date”, “publicationName”, or “airportCode”.
Interaction Model
An interaction model is the combination of user utterances, intents, slots, as well as their relationships to one another. For example, you may have an Intent called setAlarm, that is triggered by utterances like “Set an alarm for 5am”, that requires a value for the slot “time”. All of those pieces (intent, utterance, and slot) make up your interaction model.
Intent Recognition Service
An intent recognition service is a machine-learning tool that takes interaction models as input, and creates (or “trains”) a natural language model. Microsoft LUIS, Wit.ai, and Alexa Voice Service are all examples of intent recognition services.
Natural Language Model
A natural language model is the output or product of an intent recognition service. You need a natural language model to power any kind of conversational-UI, from chatbots to Alexa skills.
Resources for Getting Started
- Tool: TinCan.ai — a prototyping tool for Voice-UIs. You can use TinCan build a prototype, test is out, and collect utterance data. (Here is a demo of TinCan.ai to give you a better idea of what it can do.)
- Tutorial: 6 Steps to Build Your First Alexa Fact Skill — an easy tutorial for developers trying to build their first skill.
- Book: Don’t make me tap! — an easy-to-read book on Voice-UI best practices. (I also wrote a review of this book on the UX Bookclub blog.)