Developing Conversational IVR Using Rasa Part 2: The Rivr Bridge · Nu Echo

Karine Dery
Jul 9 · 5 min read

Hi again! If you are here, reading the second episode of our series on the development of interactive voice response (IVR) applications using , I’m assuming you have read its by my colleague David. If not, I recommend you read it, as it introduces this one better than I could do in one paragraph.

(Time elapses while you read the first post…)

Perfect! Now that you have read it, you know we wanted to use Rasa with a platform, and that Rasa does not offer a VoiceXML channel. The obvious solution would have been to create our IVR channel to generate the VoiceXML pages and interpret the results coming from the platform. That approach might be a good idea in the long term, but it requires a substantial amount of effort to be reliable, mostly because generating VoiceXML is far from a simple task. So that’s where we took a shortcut and created the Rivr Bridge.

I know, there’s only one question on your mind right now: “What in the world is that name?”. First off, “ ” is because it uses Rivr, subtly mentioned in the first post, which is a created, open-sourced framework to write VoiceXML applications, entirely in Java. Then “Bridge” because it links a VoiceXML platform with the chosen dialogue engine. And yes, the pun was intended (but not by me).

But the real question is: “What does it do?”. As I said, Rivr is a framework to develop full-fledged applications, but the Rivr Bridge’s goal is only to translate what comes in and out of the VoiceXML platform and throw it to the Rasa side of the world in a digestible format. For instance, a classic Rivr application would programmatically process each user input and define the next dialogue steps, unlike the Rivr Bridge, which would query the chosen dialogue engine to decide the next dialogue steps. Adapting the model was simple, maybe even simpler than we thought. It roughly looks like this:

The great advantage of using the Rivr Bridge is that it interprets the VoiceXML platform’s input and generates bulletproof VoiceXML. For reusability purposes, we decided to make the Bridge platform-agnostic and application-agnostic, and let an IVR channel on the Rasa side manage the Rasa-specific aspects, which would allow us to eventually plug in other dialogue engines.

Here is an artistic representation of our input pipeline:

To better define the content of the requests and responses exchanged by the Rivr Bridge and the IVR channel, we designed a generic JSON protocol that could represent all necessary information for a conversational IVR application using VoiceXML. The protocol describes 5 types of input, namely: data (initialization data for example; caller’s phone number or any information the platform is set to return), user input (vocal or using the keypad) recognition/interpretation result, recording (of the user’s voice), transfer details (status, duration, etc.), event (hangup, noinput, nomatch…). Concerning outputs, we only designed support for interaction (the dialogue asks for a user input) and exit/hangup to cover our use cases.

As an example, to ask a question and wait for the answer, the dialogue could send this payload:

And the result sent by the Bridge could be:

Not a lot was then left for the IVR channel to do. Concerning inputs, each one would need some processing to be made accessible to the dialogue management. Specifically, inputs have to fit into Rasa’s NLU result format (namely, a string following the template:

). With well written grammars, this step’s implementation was rather simple for recognition results, but could have been tricky for input types with no intent nor entities (data, events), for which we still wanted to trigger a dialogue turn. To solve that problem, we could either create synthetic intents and entities representing the information we wanted to pass on, or insert it directly in the tracker and send a semantically empty input. We went for the first option, and created four synthetic intents to date:
- start_conversation (with a data entity containing initialization data as a JSON object)
- noinput
- nomatch
- hangup

For the outputs, yet again some formatting was necessary, but since Rasa gives us full liberty on the output content through , this was pretty straightforward. The (tiny bit more) delicate work was to concatenate and validate outputs from different parts of the dialogue. Rivr supports playing messages alone (without a recognition or hangup step), and it could be a nice feature for our Rasa dialogues, but would have required a bit more gymnastics in both the channel and the Bridge, so we chose not to implement it for now.

Ok, presenting it like that, maybe the IVR channel had a lot to do even with the use of the Rivr Bridge. But it was still less than generating VoiceXML content would have been. Thanks Rivr! To discover the journey of those user inputs once they enter the Rasa ocean, read the yet-to-come rest of the series!


Originally published at on July 9, 2019.

CXinnovations

A blog about delivering the best customer experience in omnichannel contact center projects and solutions, speech and conversational technologies, A.I., chatbots, etc.

Karine Dery

Written by

CXinnovations

A blog about delivering the best customer experience in omnichannel contact center projects and solutions, speech and conversational technologies, A.I., chatbots, etc.