Intelligent Chatbots Vocabulary—101
If I have to make any tech prediction for 2019, I’d state that we are going to hear a lot about chatbots, especially Intelligent Conversationalists. Here is why. In late 2016, I was one of the luckiest GDEs who to get an early Google Home device (kindly offered by Google). So I’ve decided to extend the platform with my own “actions” and I was amazed how easy and fun it was! I’ve kept hacking and eventually started building VUI apps for businesses during the last year. I could clearly see all the interest this new media has gotten from small and large companies. So, I expect this keen interest to dramatically grow this year—and beyond.
Also, for us developers, let’s admit it: This is a really interesting and fun platform to build (hack) for. Right?
Now, let me guess, you are one of the those luckiest developers who got a Google Home device for Christmas—or maybe you already have one. If not, go get one, and get ready to build to your first app. However, before you do, I’d like to share with you the necessary vocabulary required to understand the chatbots ecosystem.
I’ll focus especially on the Google Assistant platform, but as you’ll see, most “bots” providers use the same terminology.
Google Home & Google Assistant
When you hear about Google Home and Google Assistant, just remember that:
- Google Assistant: is the intelligent personal assistant. You can ask it questions and tell it to do things.
- Google Home: is the voice-activated speaker powered by the Google Assistant.
First thing to note here is that the Google Home relies on the Google Assistant to work. The Google Assistant is actually the underlying runtime that powers the Google Home devices, Google Allo, your Android TV, Android Wear 2.0+, and much more. In fact, the Google Assistant allows you to interact with any of your Android device —and even your own custom hardware— in a more “Natural” way by:
- Talking (i.e. speaking) to your devices through Google Home or your Android TV.
- Talking (i.e. speaking or texting) to your assistant via Google Assistant.
On the other side, the Google Assistant relies on Actions (aka Actions on Google) in order to engage with users. You would have an action that checks for the weather, another one for your news feeds and probably an action that switch on/off your lights…etc.
I have Action that switch on/off my Christmas tree…
In order to extend the Google Assistant with additional Actions, you can do it either with the Actions SDK or with online tools such as Dialogflow — or even a combination of both.
Dialogflow is an online app that allows you to build and deploy conversational applications for Actions on Google and other platforms such as Facebook Messenger, Microsoft Cortana, Slack, Twitter and many more. All of this, with ease and with almost NO programming skills — really. With Dialogflow you can create and manage the Intents, Entities and Contexts that make your conversional application.
This method involves using the Actions SDK. It certainly requires a bit of setup but is more flexible.
The official Google Actions SDK basically allows you to:
- provide an action package with some metadatas.
- code “invocation” and “dialogs” components.
This SDK gives you a fairly low level API to help you build your Actions. The API provides everything from creating Intents and registering them to sending SSML instructions to the Agent so it can speak them back.
When building any conversational application—not only for the Google Assistant—you will deal with a specific jargon and concepts. Thankfully, most “chatbot” providers (Google, Amazon, Microsoft) do agree on the same concepts and use the same SSML standard for speech synthesis. Let’s take a look at these concepts.
The Actions Simulator is the official online tool that allows you to try your Actions. While this is a good way to try and debug your Actions, I’d recommend using a real device: a Google Home or the Google Assistant on your Android or iOS phone.
These are probably the most confused concepts. Let’s demystify them (with simple words):
- ASR or Automatic Speech Recognition is the process of taking the speech (voice) signal as an input and then finding out the words that were actually spoken.
- NLP or Natural Language Processing is an umbrella term that describes the ability of the machine to manipulate (syntactic parsing, text categorization…etc) the human language (English, French…etc).
- NLU or Natural Language Understanding is a subset of NLP that is responsible of the semantic parsing and analysis, entities extracting…etc. NLU tries to structure the input data so it can easily be understood by the machine.
- NLG or Natural Language Generation is the step when the machine tries to transform the structured (from NLU) data into human readable language.
- TTS or Text-to-Speech Synthesis, put simply, takes the generated text from the NLG and converts it to speech.
When someone talks about NLU, they probably mean: ASR→NLP→NLU→NLG→TTS.
Agent (aka Action)
This is the piece of program you build (or create with Dialogflow). Its role is to handle the user’s intents (requests) and process the fullfilement’s responses. This is what we usually call a “bot”.
A context is basically a discussion thread: an exchange of ideas among two (or many) participants. When building the logic of your bot, the context would be your “conversation state”, this is where you want to store important information (usually the Entities) for a specific task or request.
Intent (aka Utterance)
Intents are represent the user’s “intentions”. In a real conversation, your intents are defined by what you say. In a conversational application, it’s the same workflow. You will provide a set of base sentences for a given Intent. For instance, eating something, ask for something or calling someone. Your agent should be intelligent enough to be able to learn from these sentences in order to understand what do you want to achieve, even if you don’t say or write the same base phrases.
Note: Dialogflow is really good at that since it allows you to train your agent to understand you better using Machine Learning.
Entity (aka Slot)
Most modern chatbots rely on NLP and NLU to process and understand the human language. Thanks to NLU, chatbots are able to extract some important information from the user’s intents. This extracted information is what is called an Entity (or a Slot).
Entities can be defined based on your specific use case, e.g. a business product name or a list of vehicles models; Entities can also represent popular common concepts such as “Celebrities”, “Date and Time”, “Colors”, “GPS addresses”, “Amounts with Units”, “Geography”…etc. These popular concept are automatically extracted by most NLU engines, thanks to the power of a Knowledge Graph and Ontologies.
This is a just modern way to describe user interfaces which don’t require graphical UI to interact with. These are also called “Voice UIs”.
This usually refers to the logic that will process the user’s intents. In other words, this is the code where the chatbot’s logic lives. You will host this code somewhere and expose it to the Internet so the chatbot platform (Google Assistant, Alexa…) can access it and request it (through HTTPS).
That’s It 🤖
That was the minimum vocabulary you need to know as a developer in order to start building your next Actions for the Google Assistant—and beyond.
Some fun Actions I built for the Google Assistant
Follow me on Twitter @manekinekko to learn more about chatbots and the web platform.