- Natural Language APIs provide a good way to get started with single interactions (ex. Question-answer, command-action)
- When you move to conversations, most of the logic, including managing the state and changing the actions based on it, needs to be implemented by the developer, often in a rigid way
- Available APIs are good for building MVPs but have several drawbacks (Context and history not used by the algorithms, poor failure management, no dialogue optimization, no way to factor in expert knowledge, lower accuracy) that may hinder user experience
Bots are the new black, everyone wants to build one.
If you want to do it too, unless you have a Natural Language Processing expert on your team, public APIs are your safest bet. For building an app that has to understand a single command (ex. Siri), current APIs may solve your problem. If you want to build a conversational agent, things get more complicated.
Conversational agents need to understand what the user wants to do (the intent, ex. buy something) and collect a series of information about it (the entities, ex. what he/she wants to buy) in order to perform an action. At every interaction, based on the history (the state) and the current user input, agents should either request information, ask for confirmation or perform some kind of action.
The main focus of public APIs is understanding the intent and extracting the entities, with some support for managing the conversation. All APIs need to be trained by the developer before being able to determine intent and entities. Developers provides examples of user input, then say what was the intent of the user highlight and tag words represent entities. After few dozens of examples, the algorithms should be able to provide significant results on inputs that are (very) similar to the examples. Most of the training is done with web-based interfaces and sometimes there are some pre-trained models for frequent use cases (ex. Weather, calendar, email, etc.).
IBM provides a set of text understanding APIs under IBM Watson Developer Cloud and AlchemyAPI. There is no single API that does intent and entity recognition in a single call, this is inconvenient since you have to build your own pipeline with multiple API calls to extract all the information you need and it’s harder to link entities to intent.
There is no service to manage the conversation using these APIs.
For simple scripted conversation, IBM provides the Dialog API. Dialog API is not integrated with text understanding APIs and you may need to write thousands of line of XML to build a simple app (see the example XML behind the demo movie app), not exactly developer friendly. If you want to go the full scripting way, you may be better off with Chatscript.
IBM offers also the Watson Engagement Advisor which seems more advanced but it’s API is not public.
- AlchemyAPI from $0.007 to $0.0002 per call, depending on quantity
- IBM Dialog $0.02 per call
Microsoft provides the Language Understanding Intelligent Service (LUIS), which has an API that can do both intention and entity recognition at the same time. You can bind an action (ex. Calling another API) to a set of intent and entities. LUIS has a list of pre-built intents and entities.
To build the full conversational agent, Microsoft provides the Bot Framework. With Bot Framework, it’s possible to script a loose dialogue using Node.js or C#. In Node.js, business logic and conversation flow is written inside callbacks that listen to events; when LUIS recognizes intents and entities, it emits the related events and callbacks get executed. Much needs to be managed on developer side, but it’s the best compromise between flexibility and not having to build a framework.
- $0.75 per 1000 transactions
- 10 transactions per second
Wit.ai, acquired by Facebook in 2015, released the first version of Bot Engine on April 12th 2016.
API training is done around Stories (domain specific use cases), where the engine learns conversation flow from examples of user input + bot response. The SaaS engine does not provide action support, calling external services must be implemented outside the platform. Context as well is not explicitly managed by Bot Engine but can be passed around by the developer as a JSON object
Bot Engine is available in 11 languages, with 39 more currently in beta.
Since all Stories built by other developers are publicly available, you can copy another developer’s Story (including training) to jumpstart your project.
- All your stories and training is public and available to anyone
Recast.ai seems the enterprise version of wit.ai. Compared to wit.ai, you don’t get automatically generated responses and flow, but much like Microsoft LUIS you get just the intent and entities extraction and need to manage all logic and flow on your side. Like wit.ai, you can use intents made available by the community for similar tasks. Reacast.ai provides an enterprise option in which what you developed remains private.
- Free for developers, as long as the code is public on Github
- Enterprise pricing on request
Api.ai is similar to wit.ai and Recast.ai, with a more mature platform.
Applications are organized around Agents (similar to Stories in wit.ai). To get started, there are already several pre trained intents (called Domains), spanning common tasks like authentication, booking, shopping, etc.. Like wit.ai, you need to pass the context around and cannot execute actions from their platform. Dialogs can be defined in the platform with the aim of collecting pre-defined information (slot filling) or with a tree structure. Api.ai provides integrations with the main platforms (Slack, Facebook, Alexa, etc.).
It is available in 14 languages, even though pre-trained Domains are mostly in English.
- From $89 to $899 monthly
Kueri is different from all the other APIs and may be worth considering for specific cases.
If you need to convert natural language queries in (simple) SQL, think Microsoft Power BI Natural Language Q&A, Kueri may be the API you are looking for.
It won’t manage intents and conversations, but if your bot is just an interface to query a database it may be worth considering.
- Currently in private beta, pricing on request
Conclusions — where public APIs shine and where they fail
Most of the APIs are good for getting started quickly and build MVPs (Minimum Viable Products). Once you get beyond that you may start experiencing the limitations.
In general, algorithms behind the APIs are tailored for a single interaction, either a question-answer or a command-action (ex. Siri).
If you are building a conversational bot, there are several issues that may hinder user experience and that you should keep in mind:
- Lack of context. While you may pass a context object to avoid asking the same thing twice, the Natural Language Understanding APIs do not use the history of previous user interactions to improve their understanding of the last user input.
- Failure management. Failure management is extremely important, an algorithm with 80% accuracy on average fails 1 interaction every 5, and every failed interaction, if not handled properly, might break user experience. Current APIs do not provide significant support for actively managing failure scenarios during the conversation.
- Dialogue optimization. Most of dialogue flows are similar to a series of if-then statements. In research, there is some progress being made with reinforcement learning algorithms to find good interaction flows, even though Reinforcement Learning doesn’t yet handle long conversations. Making the dialogue structure more loose, while still moving toward a goal, is definitely one of the needed improvements.
- Expert knowledge. The APIs learn only from example and do not provide ways to take advantage of additional domain knowledge.
- Accuracy. General APIs may be a quick way to obtain results, but the accuracy of their algorithms cannot reach the accuracy of tailored models.
If you want to address these issues, at the moment you need to build your own technology. With the pace of progress and interest in chatbots, some of the existing APIs or newcomers might soon close the gaps.
Conversate is a new Artificial Intelligence startup. We built an innovative Natural Language Understanding engine, with state of the art algorithms and enterprise services. Currently in private beta, if you are interested you can sign up at http://www.conversate.eu or write us at firstname.lastname@example.org