Building context-aware bots using Servo

Lior Messinger
Jan 22, 2019 · 10 min read

Us developers are super proficient at developing applications: mobile, desktop or web applications. These apps usually follow the same flow: user fills a form (or clicks a button), the application calculates new data to show, and an updated screen is shown. With the years, architectures and design patterns have evolved to support this type of application development.. To name a few, one can mention dependency injection, micro-services, aspect-based-programming, MVC (Model View Controller), and others.

However, these do not work well for chatbots or voice assistants, which don’t have one form to fill or specific buttons to click. The user can say anything, sometimes unrelated to the current question. A sales bot might ask the user for the size of the shoes she was looking for, and the user would respond by asking the bot if they have boots instead. A car voice assistant might be in the middle of a conversation about finding a restaurant nearby, while the user tells it to call his friend that lives in the area. The friend is not picking up, and the bot is expected to return to the same point it left off.

Of course, one could confine the conversation, but that would obviously hurt the user experience. Or, one could try to provide ad-hoc logical solutions — but that might result in complex, large and hard-to-maintain code base making it hard to maintain.

At its heart, this is a state management problem. Servo is one of the few frameworks that can solve that. It’s doing so by utilizing Behavior Trees, which is a programming paradigm developed in a software industry that has been developing bots for years now — the gaming industry.

In this tutorial, we will explain how to start a simple bot with Servo. I assume here that you are a developer familiar with Github, NodeJS, and importantly, you know how to use NLU and NLP engines (with intent, entities etc). If you don’t, there are excellent resources all over the Internet — just search for ‘Wit tutorial’ or ‘LUIS tutorial’.

Getting Started

Servo is an open source framework and as such, you can fork it at Github and follow the readme to install and npm start. It should run the behavior tree editor and the Servo server on your local machine, each on its own process. Then, open localhost:8000 in your browser and you’ll get a sign-in screen. Passed that, select Projects, And open your very own New Project:

This tree represents a bot that deals with a nice set of issues and can serve as a simple tutorial for the framework. Click on Debugger and then on the Run ▶️ button, and you will be asked for your age. Let’s enter a few numbers and see what happens:

  1. If an old age is put in (say, 55), the bot will respond by quoting your age, and sending you to vote
  2. If a number smaller than 18 is entered, the response would be that you are too young
  3. At 32, the bot will give you a geeky remark about your age

But: what if we enter something completely different?

  1. If you go with something like “who are you?”, you’ll get an answer. Then, you’ll be directed back to the age question
  2. For responses that are not understood, the bot will give you a help message before asking the question again

How does it work?

Let’s start by looking on the central node “Age?”. Select it and click on the Properties tab. You’ll see that the Type of the node is AskAndMap, it has a unique GUID and a title. Click on the properties📄 icon and inside the JSON shown you’ll see a few interesting items:

First, the prompt member:

Holds the questions the bot asks the user. If cyclePrompts member is true it will cycle through them, otherwise it will reach the end and keep with the last one.

Second, the contexts array is used for selecting the right child. Once a user responds to the prompt, the response is sent to an NLU engine (Servo comes with a default model, configured in the root properties). The NLU engine extracts intents and entities from the sentence. These are then matched against the contexts, and the best match is selected.

So, if the user responded with a number (eg 22, fifteen etc.), the first context is going to be selected, because it is a number entity which is expected here:

Then, the flow continues downwards for the rest of the conversation. We’ll delve into that in a minute.

If the user responded with something which the NLU didn’t understand, the third context is selected:

There are some minute differences between default and helper, but on that, some other time.

If, on the other hand, the user says something that the NLU recognizes, but it’s contextually different (eg “who are you?”), the bot then selects a context to continue on, based on the intentId. As one can see, some of the contexts are selected with one of several intentId’s.

Behavior Trees

Servo didn’t invent too much proprietary programming methodologies, but rather chose to rely on industry standards as much as possible. One of the most successful paradigms, especially in gaming AI, has been Behavior Trees (BTs). Most gaming engines, such as Unity or Unreal, come with a BT editor, and they are very useful to construct rule-based behaviors.Servo is built on top of the super-well-crafted Behavior3 editor, written in Javascript by Renato de Pontes Pereira.

A word about AI is in place here. While deep learning receives a lot of hype (and rightfully so) it seems that for many real world applications rules are still needed, at least as an orchestration framework. While AI does wonders at classifying big data streams, the outcome needs to evoke some action, and that is best dealt with by rule-based logic, that connect these classifiers to input/output channels. In that sense, Servo combines the best of both of these paradigms.

One could read more about Behavior Trees, but here I’m going to teach you quickly just the important stuff. Let’s a look on the left hand side of the tree, which is reached through the leftmost child upon entry of an age:

Behavior Trees has a main loop, executing the current node a few times a second. A node execution could return one of three results: Success, Failure or Currently Running. If a node has any children, it executes its children and then, based on the execution result, returns its own execution result.

A node that has children is called a Composite node, and these have only two main types. Let’s follow the execution path here and understand them.

The ? node is called a Priority node, and acts like an OR selector. It tries to execute its children from left to right. If one of the children returns Success, it stops trying and returns a success, too. That’s why it’s called “priority”, because this gives a priority to the left-most children.

If no child succeeded, then the Priority doesn’t succeed either, and returns a Failure.

Here, the execution then continues downwards to the → node, called a Sequence. The Sequence is like an AND: it executes its children from left to right, expecting all of them to succeed. If one of the children fails, the whole Sequence fails, and returns Failure.

So, the execution continues on down, to the age >= 18 node. This is a Condition, that compares the age to 18 (we’ll talk in a minute on how this comparison is made).

If the Condition succeeds, the execution continues to the ‘time to vote’ node. This is an Action, and as the name implies, it’s where all the action happens. Select it, and you’ll see that it’s a GeneralMessage action, that outputs a “you can vote” sentence to the user. We then continue to the green ‘good-bye’ hexagon. This is a sub-tree! Double-click it and you’ll go into it.

What if the age is less than 18? You might want to take a look above before continuing reading and work it out by yourself.

If the age < 18, the Condition fails, causing the → Sequence to fail, and the Priority then goes to the next child: too young.

Easy enough, isn’t it?

Hierarchical Memory

Now let’s look into the details of Conditions and Actions, and talk about memories. Select the age>=18 condition and open its properties:

What happens here? Actually you can read a short help section in the Description field of the node. It reads ”Compare fields across global,context, volatile and message memories. left and right operands should have a dot notation with the object name. Eg: message.chat_message, context.amount etc. Operator could be any logical operator like ===, <, <==, !==, ==> etc. “

Indeed, this is a simple relational operator, returning true or false for Success or Failure. So far so good. But where did the context.age expression come from?

Well, remember the contexts array in the “age?” node? That’s where it is coming from. Turns out that once a context is selected, all the entities and intents are “mapped” into the fields the context defined. We had, for that context:

This defines what happened: the NLU recognized a ‘number’ entity. The system then created an age field in the context. Once there, that field is available to all of the context descendants. This is really important, not in and by itself, but because of a question it brings in: what if there’s another context. In other words, what if another question is to follow the age question?

Luckily the way it works is known to anyone who knows anything about object-oriented programming, and especially JavaScript inheritance. If another question follows a parent question, then a new, child context will be created. If nodes then refer to some context.field it is searched upwards, until it finds a field that matches the name or until it reaches the root of the tree.

This is pretty powerful, because once the bot understood the age of your user, and even if it continued to talk about new topics, you could still refer to context.age and the framework will fetch the most recent talked-about age. By the way, why the fancy name “Hierarchical Memory”? Well, this is probably how our brain identifies entities.

Other types of memory include:

  • Global: whole conversation global memory
  • Message: the latest message arrived from the user
  • Volatile: memory that is never serialized into the database. This is good for in-memory complex objects.
  • Local: per-node memory
  • And also, an undocumented Fsm memory, where one can access properties of the conversation process, as defined at the root of the main behavior tree.

With these comes also an important Action type, called SetFieldAction. If you need to set a field at one of those memory areas, that’s the place.

Delivering the Message

The last piece still missing in the flow is the message that goes back to the user. How does one goes to construct it? For that you could take a look at the “time to vote!” GeneralMessage properties:

I’m sure you’ve noticed the <%= %> notation. This is a well known technique for web development called “templating”. The <%= tells the framework to evaluate the expression against an object containing all memory areas, so we could also use <%=global.fieldName%> and others. Interestingly, we are not limited to expressions, but could use code, too. For example, adding <% if (context.age>=100) { %> you are one of our eldest voters <%}%> or <% if (context.age>=100) print ‘you are one of our eldest voters’ %> would print that sentence for those with age>=100.

Templating is available for many actions and node types. Look at the description to see if the node asks specifically for a ‘dot notation’ or “memory field”. If it doesn’t, then you can use templating instead.


As you probably saw, Servo comes equipped with a built-in debugger. Assuming you are a developer, it’s pretty straight-forward. You can set breakpoints (leaf only at the time of writing), run, step, and view the different memory areas discussed above.

Two important remarks:

  • The breakpoints are reached “post-tick”, which means, after the execution of the node
  • If you change things in the tree, just remember to publish! Although Servo gives you a warning, it’s easy to forget

Into the Deep

Servo is a large framework that brings many more features that can accelerate development for bots, automation and flow-oriented systems. Among its other features:

  • Alexa, Facebook, Web and other custom clients
  • Database interface
  • RESTful services
  • Context switching
  • Backtracking: change of mind
  • Automated dialog testing
  • Data-driven image rendering
  • Custom nodes and drivers
  • Sub-process and sub trees

Some of these are covered in the documentation that comes with Servo and some, as they say, are there for the brave explorer. Enjoy!

Data Driven Investor

from confusion to clarity not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store