Developing Conversational IVR Using Rasa

David Morand
Jun 26 · 3 min read

Welcome to the first installment of our series on the development of interactive voice response (IVR) applications using Rasa!

There has been a lot of buzz around Rasa recently (version 1.0 has been released a few weeks ago, along with Rasa X). At Nu Echo, we have been quite busy since the beginning of the year comparing various alternatives to develop conversational IVR applications (including building our own in-house engine). We finally decided to go with Rasa since it offers a good balance between functionality, flexibility, extensibility and control over the solution, as it can easily be managed both on-premises and in the cloud. Rasa is also open source and has a vibrant community. And let’s not forget about the possibility of creating dialogues using examples and machine learning (more on that later in the series)!

The lasting relevance of VXML

Since Nu Echo’s foundation in 2002, VoiceXML has been the bread and butter of our IVR application development. We’ve been using it to create many solutions ranging from turnkey address change and identity validation modules to large scale IVR system using a custom JavaScript framework. We even open-sourced a Java VoiceXML development framework named Rivr, check it out!

While it is definitely possible to develop IVR applications in 2019 without using VoiceXML (think Dialogflow or Amazon Lex through Amazon Connect), it is still very prevalent in the large contact centers, which are an important part of our customer base. That’s why we decided to give Rasa a serious spin to find out if it could be a viable solution for developing multilingual conversational IVR applications using VoiceXML.

The use cases

For our proof of concept, we selected some banking use cases (account balance and pay bill) that offer us interesting dialogue patterns like digressions, confirmations, corrections and global commands (cancelling a task for example).

Here is an example that mixes some of those patterns:

Introducing Rasa IVR

Developing VoiceXML IVR applications using Rasa offers interesting challenges. For one, the real-time aspect of a voice conversation that must progress if the user says nothing is quite different from the classic chatbot approach. The application can’t keep the user waiting if he says nothing, it must propose alternate and more detailed messages and eventually terminate the conversation if the user decides to stay silent.

While Rasa offers a lot of prebuilt channels, nothing exists to express the richness of VoiceXML and interpret its different outputs. As you can see from the complexity of its specification, a lot must be done to cover all the functionalities (although some of them are less used than others). Some of the most important ones that must be taken into consideration for developing basic use cases are related to constructing the output (audio files / speech synthesis), activating bargein, specifying grammars / input mode (speech and/or DTMF), and configuring confidence levels / timeouts.

The user experience (UX) must also be tailored for the voice channel since the user cannot scroll up to access the whole conversation (unless he has a supernatural memory). Some patterns like confirmation or choosing in a list are much trickier to properly implement using voice than text and widgets.

Along with those challenges come some interesting opportunities. For example, using automatic speech recognition (ASR) alongside contextual grammars allows us to greatly improve the recognition accuracy by giving a greater weight to the most probable responses. VoiceXML also offers many functionalities for a better integration to the contact center, which must be exposed (agent transfer, attached data, recordings). The synchronous aspect of the conversation also simplifies the implementation since the user can’t frantically send multiple (sometimes contradictory) inputs.

As I said earlier, this post is the first of a series that will cover different aspects of the making of our conversational banking application proof of concept. Stay tuned for more articles from my colleagues on our approach toward generating VoiceXML, dialogue management, cloud deployment and more!

Many thanks to my colleagues Linda Thibault and Guillaume Voisine for their precious advice!


Originally published at https://www.nuecho.com on June 26, 2019.

CXinnovations

A blog about delivering the best customer experience in omnichannel contact center projects and solutions, speech and conversational technologies, A.I., chatbots, etc.

David Morand

Written by

CXinnovations

A blog about delivering the best customer experience in omnichannel contact center projects and solutions, speech and conversational technologies, A.I., chatbots, etc.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade