Understanding the Differences Between Alexa, API.ai, WIT.ai, and LUIS/Cortana
I have been teaching myself about AI application development to understand the security vulnerabilities present in these types of applications. It has been an interesting journey through Alexa Skills Development, API.ai, Wit.ai, and Microsoft LUIS (Language Understanding Intelligent Service)/Cortana. This first article is a general overview of developing applications utilizing these platforms. My next blog on AI Assistants will go through the security issues that I discovered with each platform. You cannot understand the weaknesses that are in application code until you start developing with it.
Overall Architecture of AI Assistant Based Applications
If you look at all of the AI platforms, you will see that they are very similar. You have a user who speaks commands/questions to a device. This device will record the audio and stream it to an intermediate service. The intermediary will recognize this as an initial request and send the audio to the speech-to-text service. The speech-to-text service converts the audio to text and returns text to the intermediary service. The intermediary then sends the text to the text-to-Intent/Action component. This component is responsible for figuring out what the user wants to do. Usually AI Assistants will have phrases which trigger named intents. For example, an application can look for the phrase “What’s the weather in {Boston} {Massachusetts}” to trigger a get_weather intent. The part in {} braces are called slots. Think of them as variables for voice commands. Once the intent/action name is figured out then depending on the platform, more interactions can take place gathering needed information by the platform or a webhook is invoked. Some platforms require you to make a web call for every intent. Others, allow you to gather all of the data in their platform before invoking your webhook (business logic).
When your webhook is invoked the intent name, slot names, and slot values are passed to your business logic. This business logic could be housed in an AWS Lambda function or Heroku server. Your business logic identifies which internal function needs to be called based on the intent name and then reads the required values from slot values using the slot names. Your business logic can then invoke REST APIs on the internet to gather information which will be returned to the device and spoken to the end user.
Although, the AI assistant platforms are architecturally the same there are important differences you need to be aware of if you are developing for each platform. Each platform is special and provides unique advantages and disadvantages.
Alexa
Alexa is currently the most popular AI assistant on the market. Alexa is developed by Amazon. The code for Alexa is open sourced. Alexa is good for simple apps where the command phrases are limited and distinct.
Advantages
Alexa is pretty straight forward to program and develop AI assistant applications with. Applications can run on any platform. You have full control because you have to manage all of the interactions in code (context, session, required parameters, etc.)
Disadvantages
Every triggered intent requires a webhook call. No development UI — you have to input your intent schema, utterances, slots and manage all of the relationships manually. Relatively weak in terms of machine learning. Harder to develop with in comparison to API.ai and Wit.ai. Limited diagnostic tools. Have to input exact phrases which trigger intents. Limited predefined reusable domains of global skills. You have to manage the retrieval of missing required parameters in your business logic. Both API.ai and LUIS provide automated prompting for missing required parameters once configured in their development UI.
API.ai
Recently acquired by Google. One of the more feature rich solutions with machine learning. Can be used in cases where the platforms learn new commands semantically similar to those input by the developer as triggers for intents. Has the concept of entities to learn about the things you are talking about. Also provides a UI to help developers with creating intents, entities, and agents.
Advantages
Smarter and can learn alternative phrases which can trigger intents and understand alternative entities. Allows you to manage context parameters through the UI. Easily add entities by highlighting them in the sample utterances through the UI. Allows you to mark entities/parameters as required so the platform manages retrieving the entity/parameter values from the user without having to go to your business logic. Alexa requires you to manage this yourself in the business logic code. Rich set of domains for building chat bots with out of the box functionality (small talk, weather, flight, news, etc.).
Disadvantages
Lack of diagnostic tools to measure the true positives, true negatives, false positives and false negatives related to intent and entity matching. Microsoft LUIS provides detailed metrics on how your intents and entities are being resolved. Lack of visualization of conversation flows. Wit.ai has a nice “Stories” feature which allows you to visually represent the conversation flows, business logic invocations, context variables, and branching logic.
Wit.ai
Recently acquired by Facebook. One of the better feature complete solutions with machine learning. As in the case of API.ai, Wit.ai can be used in cases where the platforms learn new commands semantically similar to those input by the developer as triggers for intents. Has the concept of entities to learn about the “things” you are talking about. Wit.ai also provides a UI to help developers with creating intents, entities, and agents. The major difference is that Wit.ai also provides a developer GUI which includes a visual representation of the conversation flows, business logic invocations, context variables, jumps, and branching logic.
Advantages
Easier to develop applications using the developer UI. Smarter and can learn alternative phrases which can trigger intents and alternative entities. Allows you to manage context parameters through the UI. Easily add entities by highlighting them in the sample utterances through the UI. Visually see and edit the conversation flow tree. Wit.ai also supports the idea of roles for entities. When you say “from LA to New York” both LA and New York are location entities but you can further distinguish between a fromLocation (LA) and toLocation (New York). The only other framework to support this feature is LUIS (with hierarchical entities).
Disadvantages
Does NOT have the required slot/parameter feature so you have to invoke business logic after every interaction which gathers slot/parameter information from the user to gather any missing information which was not spoken by the user. Limited predefined reusable domains of global skills. Some diagnostics but very limited in comparison to MS LUIS. Webhook integration is unclear. All of the documented examples only invoke logic in interactive mode locally or only work with Facebook messenger apps. Although, you can specify context variables for missing required parameters, you have to manage the retrieval of missing required parameters by using “Bot Executes” logic, context variables, and jumps. Both API.ai and LUIS handle the prompting for missing required parameters by just configuring them as “required” in their development UI.
MS LUIS (Language Understanding Intelligent Service)/Cortana
Very similar to API.ai. One of the better feature complete solutions with machine learning. Can be used in cases where the platforms learn new commands semantically similar to those input by the developer as triggers for intents. Has the concept of entities to learn about the “things” you are talking about. Also provides a UI to help developers with creating intents, entities, and agents but does not provide a visual representation of the conversation flows as Wit.ai does. To its credit, Microsoft does also include other language, intelligence, and assistant based services that look to be better than all of the other competing solutions. You will have to assess whether being tied to the windows platform is workable.
Advantages
Smarter and can learn alternative phrases which can trigger intents and alternative entities. Easily add entities by highlighting then in the sample utterances through the UI. Allows you to mark entities/parameters as required so the platform manages retrieving the entity/parameter values from the user without having to go to your business logic. LUIS provides richer metrics to understand how well your AI assistant app is working. LUIS also supports something called composite entities. Think of composite entities as a grouping of entities into a single predefined entity. For example, in “2 adult first class tickets from LA to NY” — 2 is the number of tickets, adult is the ticket type, and first is the ticket class but all of the entities together constitute a ticket order composite entity. None of the other AI frameworks support composite entities.
Disadvantages
Does not allow you to manage context parameters through the UI or filter intents based on context parameters set in previous intents as API.ai does. You have to manage these yourself similar to Alexa. Does not include a rich set of domains for building chat bots like API.ai. Building apps with Cortana and LUIS requires you to build UWP (Universal Windows Platform) apps (ultimately tying you to the windows platform).
Conclusion
The information in this article is in flux. WIT.ai has just introduced the visual conversation flows through its “Stories” feature. Alexa looks like it is trying to integrate more machine learning into its solution. Expect, the competition to be fierce and players to leap frog over each other in the coming year. It is going to be exciting to see what new features and services are provided by the stakeholders above. Now that you have a foundational understanding of AI assistant based applications. The next article will focus on how to access risk in AI assistant applications and the security issues related to them.