How to Build Your Own Intelligent Assistant Using Rasa

I step by step guide on how to use Rasa to create a chatbot assistant

Published in

Encora Technology Practices

10 min readMar 19, 2021

Chatbots are everywhere, and so are the tools that promise easy development and deployment of these applications. Frameworks like Google DialogFlow, Microsoft Luis, and Amazon Lex are fighting (badly) each-other to control this growing market.

In this blog post, we describe an experience we had at Daitan building a conversational agent for handling calendar appointments. We explore an open-source alternative — Rasa — a machine learning-based framework to build contextual AI assistants. We describe Rasa’s benefits and drawbacks and how we managed to build and deploy a calendar scheduler agent from scratch.

The Rasa Framework

As stated in the Rasa website:

The Rasa Stack is a framework to build contextual AI assistants and chatbots.

The Rasa framework relies on a series of machine learning systems that, when combined, can be used to decide the decisions a chatbot can do during a conversion.

In essence, Rasa contains two main building blocks.

NLU (Natural Language Understanding)
Core

Rasa NLU has the job of extracting the meaning of sentences. These are the intents and entities. More on this in a bit.

Rasa Core, in turn, takes the intents and entities as input and decides what to do next. In other words, the Core is responsible for orchestrating the series of actions that a bot can take based on the input it receives.

Let’s go over each one of them.

NLU — Natural Language Understanding

Rasa NLU is an open-source natural language processing tool for intent classification and entity extraction.

Strictly speaking, NLU uses machine learning to parse free-form text like:

"I am looking for a Mexican restaurant in the center of town"

into structured data with the form:

{
  "intent": "search_restaurant",
  "entities": {
    "cuisine" : "Mexican",
    "location" : "center"
  }
}

In this simple example, we can see that the NLU was able to identify the general intent of the user— restaurant search. And a couple of entities associated with the intent — the type of cusine and the restaurant location the user is looking for.

For a more general example, consider that we want to classify sentences about greeting someone to a unique intent named “greet”. As you might guess, there are many ways to greet a person. Below you can see some of the possibilities.

## intent:greet
- Hi!
- Hello again
- good morning
- good evening
- hey there

We can think of this as a supervised dataset. Here, the inputs are each of the hello like sentences, and the target or label is the intent greet. In this situation, RASA trains a supervised classifier to map greeting sentences, like the ones above, to a predefined category, like greet. In other words, the NLU trains a classifier that maps sentences in natural language to intents and performs entity extraction. In this way, when we type something like “Hello there,” NLU will return the expected intent “greet”.

Note that this process may work even though the sentence “Hello there” was not specified in the training data. That is the power of generalization that ML algorithms bring to the table compared to predefined heuristics. However, keep in mind that higher levels of generalization require large amounts of high-quality training data.

Also, it is important to note that this process is not free in the sense that you can just throw data into the NLU, and it will magically identify the intents and extract the entities you need. Instead, you need to provide annotated observations that specifically flag the components you want to be identified. In the restaurant search example mentioned above, the training data to NLU might look like:

## intent:restaurant_search
- show me [chinese](cuisine) restaurants in [toronto](location)

Here, both the intent and the entities we wish to identify must be directly specified.

The Core

Rasa Core is responsible for managing the current state of the conversation and deciding what should be the next action that should be taken, given a conversational context. It is important to note that Rasa Core and NLU are independent libraries. Thus, one might choose to use NLU without the Core library or vice-versa.

The Rasa Core acts as a dialog engine responsible for managing the many aspects and contexts that might arrive in a human to bot contextual conversation. Let’s go over its main components.

Stories and Domain

The most important component of the Rasa core is the stories. A story can be viewed as a general representation of a valid conversation that a user may have with a conversational agent. But behind the curtains, stories, just like the NLU examples, are training data used to train another machine learning model — the Rasa’s dialogue management system.

Rasa stories describe the possible paths of conversation that a bot will be able to handle. Basically, a Rasa story defines a series of actions that should be taken in response to intents. But it also defines the path of the conversation and its context. One can write the stories in a very simple markdown file and name it “stories.md”. Consider the following example directly from our calendar bot.

## show meeting
* show
  - action_show## remove meeting
* remove
  - action_remove

Lines starting with “*” represent intents. These are the outputs of the NLU processing — more on this on a bit. The lines beginning with “-” are the actions that will be taken to respond to a given intent.

Besides “stories.md”, the Rasa Core also needs a “domain.yml” file. The domain contains some of the most important definitions that our bot requires to properly function. In this file, we specify the intents, entities, slots, and actions your bot can handle. But optionally, we can define direct text messages that can be delivered to users as responses to specific actions. These simple responses have their names started with ‘utter_’, and can also use buttons and images to enhance user interaction.

We can think of the domain as a rigorous definition of the world in which our bot lives in.

In a short summary:

intents: General categories that users’ input can fall into. e.g. “Hello” → “greet”
actions: Things your bot can do and say in response to user input/intent. “action_schedule”
entities: Important pieces of info you want to extract from user input messages.
slots: Information to keep track of during a conversation (e.g. a users age)

Have a look at a much simpler version of the domain file used to create our calendar scheduler bot.

intents:
  - schedule_meetings
  - check_meetings
  - remove_meetingsactions:
  - action_schedule
  - action_showentities:
  - email
  - timetemplates:
  utter_default:
    - "Sorry, I can't understand. :confounded:. Please try again using other words"

Training a dialog model

Given the stories and the domain, the next step is to a dialog management model. Here, Rasa will take the stories we created and train a neural network to map user intents (previously identified by the NLU) into actions. To train a Rasa dialog management model using the stories, we can run the command below.

rasa train core

Adding NLU

Note that up to this point, our model can only map intents like ‘schedule_meetings’ into actions like ‘action_schedule’. To add natural language understanding (NLU), we need to take one step further.

As we saw, the NLU can process free-form text into structural data. However, to do that, we need a training dataset, containing examples that describe the intent we want to classify. In other words, we can use some examples like:

- show me my next meeting
- how is my schedule
- what meetings do I have next

to map to the corresponding intent “check_meetings”. Also, it is good to know that Rasa allows at least two formats for NLU training data. In the examples above, we use the markdown format for its simplicity and understandability. Still, you can also use a more structured JSON format.

To build the first examples of our dataset with less pain, we can use Chatito. This tool allows us to create a template for the desired query and to specify a series of possible words to be used. Once the definitions are done, we can choose RASA NLU as the target format, and Chatito outputs a training file ready to be used in the NLU training. Have a look at the examples below.

Note that I can specify how many training examples I want to be created.

Note how Chatito builds a training dataset (following the template query) by combining the words like shown below.

Next, we can train an NLU model using our training dataset. To do that run:

rasa train nlu

Now, we can enjoy the results.

Conclusions and Lessons Learned

We encountered many obstacles in developing our calendar aid chatbot. Here, we talk about some of these issues, hoping that some of our experiences might help someone following a similar path.

By far, the central component of any machine learning algorithm is the dataset. There are many good practices to follow when dealing with existing datasets or even creating your own. In most situations, when using Rasa, we are expected to be building both the training and testing datasets — and this is not a simple task. Specifically, for the NLU, we need to create sentences that capture the pre-defined intents and also annotate every single entity in all sentences.

In doing so, many problems can arise. Let’s picture a simple example. Suppose our calendar aid bot, has an intent ‘show-meeting’ containing a series of sentences in which the smallest number of words, in a given sentence, is, let’s say 8. It is easy to think of quite long sentences asking the bot to show your next appointment. Here are two simple examples:

## show meeting
- please, would you show me my next meeting for today?
- I want to know what my next meeting will be.

But you also have other intents like, ‘remove-meeting’ and ‘schedule_meeting’ that have quite short sentences (ranging from 2 to 5 words) in their definitions.

You might think this is ok, but training the NLU model with this dataset may result in quite weird results.

First, in a situation like this, it will be almost impossible to ask for your next appointment using short sentences like “What do I have next?”. That is because your model has picked up the correlation between the sentence length and the intent ‘show-meeting’! Similarly, if you type a long sentence for a different intent, say ‘remove-meeting’, your model will probably think that you want to see your next appointments.

Long story short, that is the problem of overfitting in learning algorithms. And this example is only one of many possible situations in which this problem can occur. In fact, your model can overfit to a particular word that always shows up in an intent or to a specific intent that has significantly more examples than the others.

In general, good datasets are not necessarily the big ones. In fact, there is one more important property for a good dataset, and that is variability. Note that it does not matter the size of the dataset in our previous example. In fact, as the dataset gets larger, so becomes the signal that correlates longer sentences to the ‘show-meeting’ intent. And that is a kind of problem you might want to watch out for if you are creating datasets with a tool like Chatito. Since it builds the training sentences by permuting words, sentences tend to grow in length, as we try to enrich the data.

Instead, we are better off aiming toward a more diverse set of sentences that better explore the richness of the language and expose examples of various lengths. However, keep in mind that it is almost impossible to anticipate everything users might say, even for a single intent. In this situation, the best option would be to capture real-world examples of how users would interact with your bot and use these examples as training signals to improve your agent. This way, you guarantee that the data distribution you will use during training is more adequate and similar to the data distribution that your model will experience once deployed. Luckily, Rasa X captures precisely this idea of listening to users and using those insights to enhance the AI assistant.

In general, any machine learning system is as good as the data it receives. If the data is too noisy, it is difficult to extract the signal from it. If the signal is skewed, we might end up optimizing an overfitted model that behaves oddly like the one we described above. For Rasa specifically, we must strive to create datasets that have at least two properties, balancing and variability. In other words, we want intents with a similar number of training sentences, and the sentences to expose a diverse usage of the language — training sentences that resemble the inquires that real-would users would use.

Thanks for reading.

Acknowledgment

This piece was written by Thalles Silva with the Innovation Team at Daitan. Thanks to Fernando Moraes and Kathleen McCabe for reviews and insights.