Conversational UI of Varying Forms
By Anand Verma at Brilliant Basics “bb”
Recent improvements in voice processing and AI have made computers and phones smarter in understanding what users really mean. This has led to the dramatic growth of Conversational User Interfaces (CUI). It’s not about clicking on the icons on your device anymore, now you tell your device what to do. This will increasingly help us automate the type of tasks we have usually been doing on our own.
At BB labs in London we have been carrying out a Proof of Concept (PoC) to demonstrate the employment of CUI of varying forms, setting all UIs to interface with a common dialogue service hosted in the cloud. Cognitive services have this year become more accessible since the revolution in cloud offerings services from Amazon, Facebook, Microsoft and IBM. This is one of the technologies that will become part of our lives in 2017 as per our forecast.
After dabbling initially Facebook’s Wit.ai for v0.1 of our PoC, we evaluated alternatives with Microsoft’s LUIS and IBM’s Watson. Although LUIS offer a number of straightforward integration options. We settled quickly on Watson for v0.2 as the fundamental service to utilise for the natural language processing (NLP) and intent and dialogue definition are helpfully possible to configure and validate using a web based wizard tool.
Watson conversation services are offered in IBM’s Bluemix cloud service and many open source starter middleware projects use NodeJS, so we aligned to hosting NodeJS middleware apps in AWS where needed. One of the other elements that aided in the establishment of the building blocks of the setup was Node-Red, a tool that allows the visual mapping and connection of API endpoints, parsing blocks and debugging.
Initially we set out to just hook up an iOS app to implement some of the visual and shortcut button options designed by our user experience (UX) team. We quickly realised a simple web based chat window was also fairly achievable. With the two of these modelling 80% of the designed dialogue. We moved to include in the mix a Slack bot, an incremental step further which led on to the main challenge, Alexa.
As it happens Alexa can be directed to interface directly to a suitable API endpoint without having to use the Amazon Lamda offering which gave hope that it could be plugged into Watson conversation.
However the message schema for Alexa and Watson Conversation are different so we quickly leaned on the Node-Red application for rapid prototyping. Although the heavy lifting for the intent, full NLP and dialogue can be done by the Watson conversation service, and a little additional logic by Node-Red, unfortunately it was necessary to implement some duplication in intent definition on the Alexa skill. Essentially this is because some of the performance on Alexa comes from employing Alexa to process the speech.
However, in order to get the best out of Watson conversation, we used the (now deprecated) intent type of ‘LITERAL’ with the Alexa skill, so that we could quickly extract arbitrary words and phrases from the user’s intent and pass them over to Watson for action.
So with iOS, Web Chatbot, Slack and Alexa in the mix, the high level flows between the elements is illustrated below.
The implementation of the iOS app was directed straight with Watson Conversation API. However, in order to achieve the button interface to binary connections and the image data illustrations in the responses, the JSON response schema in the Watson dialogue was adapted to include custom key value pairs pointing to images on Amazon S3. The front end then takes action when it receives this style of response.
This was the implemented similarly on the web chat client (running on NodeJS). Next, having got up to speed with the simplicity of NodeJS, we moved quickly to evaluating and standing up a Slack Chat bot using botkit middleware from https://github.com/watson-developer-cloud/botkit-middleware
After a few clicks in the administrator settings in our slack account, creating an API secret and then a moment filling in of our IBM API keys to the botkit template, we were away. Before we knew it, we had the fundamentals of the conversational UI through Slack, an instant messaging and group chat platform that is gaining huge market share of many peoples connected workspace.
Finally, Alexa, we had dabbled with Alexa back in May 2016 with a simple fact regurgitation type skill, using intents with a set of scripted (and slight deviation) phrases, to bring back a response but there had been negligible integration.
This time we wanted a little more integration and ultimately we wanted to reuse the backend that was powering the progress so far on iOS, web chat and Slack.
After a little research we stumbled upon this very helpful blog (Ref.1) which gave us the founding principles for Alexa / Node-Red integration, and for the Node-Red Watson integration (Ref.2).
We then leveraged a cheeky shortcut (for now) for our demo, by using American English (as our Echo had been set to back in May when we imported it from the US), we were able to make use of the intent type of ‘LITERAL’ which allows some free reign in the endings of a dialogue phrase like ‘tell me.. x’ where x could be hopefully interpreted by Watson. LITERAL is being deprecated in favour of ‘custom’, but the point is that Alexa pulls out the phrase and in the sending request, both declares the recognised general ‘conversation’ intent defined, but also passes on the extracted literal words from the latter part of the intent given by the user, which can be parsed and handed to Watson. This meant that unlike some of the example Node-Red projects, we didn’t need to redefine all the intents in Node-Red as well as Watson. But there was a little bit of setup required to guide Alexa, however, with further elaboration, this should be possible to minimise.
It goes without saying that additional integrations (e.g. the looking up of Foreign Exchange rates), need to be introduced in the middleware. In terms of sequence that could be, after ‘consulting’ Watson conversation (or other IBM services) for a trigger / custom JSON response, which instructs a fetch/lookup, or by building in logic earlier on. Either way, this should then be extended to the other front end UIs, at which point we would ‘port’ them away from direct integrations with Watson Conversation, to endpoints on Node-Red (or something more barebones) that relays the message but allows for further integrations to other data services. That will be the next version perhaps!
We are excited about making this real for our partners and clients and hopefully this article helps getting closer to reality. For us at Brilliant Basics, its all about getting to real quickly so that we can make people’s lives better.
Love to hear your feedback and thoughts. Thanks from Anand.