Understanding the Differences Between Alexa, API.ai, WIT.ai, and LUIS/Cortana

Abraham Kang
Apr 24, 2017 · 7 min read

I have been teaching myself about AI application development to understand the security vulnerabilities present in these types of applications. It has been an interesting journey through Alexa Skills Development, API.ai, Wit.ai, and Microsoft LUIS (Language Understanding Intelligent Service)/Cortana. This first article is a general overview of developing applications utilizing these platforms. My next blog on AI Assistants will go through the security issues that I discovered with each platform. You cannot understand the weaknesses that are in application code until you start developing with it.

Overall Architecture of AI Assistant Based Applications

Image for post
Image for post
Image for post
Image for post

If you look at all of the AI platforms, you will see that they are very similar. You have a user who speaks commands/questions to a device. This device will record the audio and stream it to an intermediate service. The intermediary will recognize this as an initial request and send the audio to the speech-to-text service. The speech-to-text service converts the audio to text and returns text to the intermediary service. The intermediary then sends the text to the text-to-Intent/Action component. This component is responsible for figuring out what the user wants to do. Usually AI Assistants will have phrases which trigger named intents. For example, an application can look for the phrase “What’s the weather in {Boston} {Massachusetts}” to trigger a get_weather intent. The part in {} braces are called slots. Think of them as variables for voice commands. Once the intent/action name is figured out then depending on the platform, more interactions can take place gathering needed information by the platform or a webhook is invoked. Some platforms require you to make a web call for every intent. Others, allow you to gather all of the data in their platform before invoking your webhook (business logic).

When your webhook is invoked the intent name, slot names, and slot values are passed to your business logic. This business logic could be housed in an AWS Lambda function or Heroku server. Your business logic identifies which internal function needs to be called based on the intent name and then reads the required values from slot values using the slot names. Your business logic can then invoke REST APIs on the internet to gather information which will be returned to the device and spoken to the end user.

Although, the AI assistant platforms are architecturally the same there are important differences you need to be aware of if you are developing for each platform. Each platform is special and provides unique advantages and disadvantages.

Alexa

Advantages

Disadvantages

API.ai

Advantages

Disadvantages

Wit.ai

Advantages

Disadvantages

MS LUIS (Language Understanding Intelligent Service)/Cortana

Advantages

Disadvantages

Conclusion

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store