I’ve started to write a chat bot

This is a story of my first month coding an enterprise chat bot with some technical info and findings.

Update:
The bot is now open source and we’re welcoming testers and contributors!
http://goodintentions.firecreekweb.com

Bots are nothing new but this old tech is being resurrected and been given a new set of knowledge and UI components.

Back in 1997 I was a regular on IRC (Internet Relay Chat). It was a fun time. Log in after school and join a channel of 20+ people talking about random things and sharing web sites they have found. Large channels were totally out of control while the moderators (ops) were on the other side of the world sleeping.

Controlling this chaos was solved by IRC bots for a short time. You download the bot code and try desperately to compile it on your friends Colo server. With a bit of configuration your bot would join the channel and then you can tell your bot to control who enters the channel and kick out trolls from pasting the same message again and again.

The bots got more useful and became the standard arsenal for any chat channel operator. Fun and productivity commands started to get introduced, including trivia, weather, currency, saving and recalling friends funny quotes and file sharing. There were even channels setup for 24/7 chat bot trivia and your score would be saved. Things changed slightly when servers started to provide their own dedicated “bots” that any user could use (chanserv, nickserv, memoserv). IRC bots became a source of entertainment rather than maintaining your channel.

I even wrote my own version of infobot for eggdrops in TCL back in 2001.

Twenty years ago I thought Javascript had no future. I also thought bots were going to see the same fate. I was wrong on both accounts. Javascript is huge and in the last 12 months bots have excited me and many others again.

So I’m back at it. I’m writing a bot from scratch (with a few libraries) in Node JS. I want to document my experiences using Node and building the architecture of a bot that interfaces with multiple platforms.


Slack made chat cool again

In early 2016 the bot buzz was kicking off. Everyone was talking about Slack and their bot catalog was growing each day. Facebook, Line Messenger, Hipchat and even Skype were rapidly improving their chat API’s to support buttons, images, inline payments and even dialogs. Bots were back, and they weren’t just text — we have the opportunity to add shiny option buttons.

Our Hipchat channel reminder each day

I’ve never really understood AI — and I don’t think I ever truly will. The startup scene is awash with the new cool bot with Artificial Intelligence. I’ve yet to see a bot that vaguely represents my representation of AI. I’m sure they exist in the labs of Microsoft and Google and we’re seeing the birth of them via Twitter and Allo.

NLP, Big Data, AI, what other buzz words can we throw in? Personally I don’t care. When I see a bot that is useful I’ll stand up and pay attention and I hope that day will be soon — or maybe I’ll just make it myself.


Back in March 2016 I had an idea for the architecture of a bot. It wasn’t so much a bot but a system. A daemon server listening to a port waiting for a connection. You connect, feed in some text and it’ll reply back. I’ll call it the brain because I have no idea what else to call it.

Flow from the chat platform (e.g. Facebook), to the agent and finally to the brain

Each chat platform like Facebook would have an agent daemon and keep a constant connection to the brain, acting as a proxy. Agents for Facebook, Line, Slack, Hipchat, Skype, IRC, SMS, it really makes no difference. These agents get information from the chat platform, manipulate the data, send it to my main server, receive data, manipulate it again depending what platform it has to send it to (e.g. Facebook) and send it.

Each chat platform have their own syntax for receiving JSON data and displaying rich media like buttons, images, cards and even full interfaces. I have read that the Chinese WeeChat has a lot of integrations but their API documentation scares me.

A separation of the brain and the end platform. Most agents could be less than 50 lines of code, and in fact you can download existing scripts from Github and bastardise them.

I wanted to prototype the idea quickly and with my knowledge of Javascript I decided to throw it together using Node. My code was messy but the idea worked. I downloaded example bots for Hipchat and Slackbot from Github, threw out what I didn’t need and connected it to my rudimentary brain server. It was alive. The same chat bot on both platforms!

The more I played with the bot the more I realised making it smart or understand wasn’t like the old days. Now users demand user experience. With IRC bots the user would know exactly what to type in… that isn’t fun.

/msg ChanServ ACCESS #teenyack ADD Pyros VOP

Regular expressions was my first go-to. I was good at Perl many years ago but my regex skills are literally none existent now. This was going to be harder than I thought.

https://xkcd.com/208/

I got distracted and got back onto work that makes real world money.


Google Allo comes out and as with many geeks I want to see what Google’s new shiny bot can do. It had some interesting quirky features but what sparked my interest was news of this bot going to replace Google Now and ultimately be accessible from your Android home screen. Digging into the API documentation I was lead to API.ai — a newly Google acquired company. I signed up and I was instantly lost. Intents, entities, agents, what? I had a quick play but I had chosen a day it wasn’t working. I ended up in the API documentation to understand the concepts.

They were doing the same architecture as I had planned to do with my prototypes — and there are names for all these elements! I was calling them components, API.ai and the community called them intents. Data was stored in entities, not models. NLP was the engine for understanding user input — not regular expressions!

Christmas is coming up and the office will be closed. I live in Thailand so I have no real commitments. Commence geeking out until the early mornings.

No one uses Google Allo.


Why not use API.ai or the other SaaS AI engines?

I’m a programmer. That’s no fun. If they go down — I go down. I want to learn, I want to get into this technology deep and understand it. What’s the point of using Ruby on Rails and not getting into the core and checking out some amazing code?

There is a common concern with startups using SaaS products like API.ai and Wit.ai. Both tools are amazing and both are a great reference how to build a bot. Their documentation is clear, easy to follow and gives a glimpse how their engines work. If they modify and tweak their NLP then your bot can go off without warning in the wrong direction. Both API.ai and Wit.ai are free for now; but they will charge soon.

I don’t want to be a configuration monkey.


Glossary

Intent
Business logic of what the user wants to do.

Entity
Data for training the classifiers

Classifier
Understanding the input and routing to the correct intent

Agent
A proxy between the main system and the chat platform, e.g. Facebook.

Parameters
Parts of the user input to be used in the intent, e.g. “New York” is a city

Unit testing
Automatic testing of the code to ensure it keeps working

Sessions
Holding user information and handling authorization

Context
Having a history of what the user has said and linking it to the current conversation to make sense of it.


Laying out the challenges

It started with cleaning the code. Rename components to intents and models to entities. Break down each intent to do one thing, it shouldn’t act like a MVC controller.

I’m building a bot framework. Why not! Before MVC design pattern every web developer was building their own framework. We’re in a dawn of something new with bots, the choices to pull of the shelf at present are limited.

When the code is good, doesn’t look like crap and works well it’ll be released open source for other people to contribute.

I laid out the big challenges I need to work on first. If I’m going to invest time I need to work on the hardest parts first. I’m not going to worry about listening to a port, taking connections, requests, queuing or even keeping my code tidy. If I crack the few big problems first then I can easily put the rest together later. So I made a list…

  • Understand user input and route to the correct intent
  • Unit testing
  • Sessions and data based on an API call to a remote server
  • Parsing parameters and using them within intents
  • Contextual conversations
  • Memory!

Understanding user input and routing them

The more intents (things the bot can do) the more likely the bot will go off in the wrong direction. If you have a small amount of intents then it’s less likely to go off the tracks. I showed a friend the bot in it’s very early days.

“how are you?”. He asked.

“You need to specify a city or country for a timezone”. The bot replied.

“wtf”

“Sorry, I don’t understand what you mean”

Full work flow of user input to replying with an agent webhook

When the brain is initialized every entity is loaded, intents are loaded and finally the intents will train the classifiers — we’re then ready to take input. Each intent specifies which entities (data) it wants to use.

Loading the main system

Entities are lists of countries, cities, units of measurement, emotions, animals, colours and the list goes on. The more entities the more you’ll be able to do. Intents can share the same entities, time zone and weather intents will both share cities and countries entities. Each intent has a defined classifier — where the training data will be stored.

Example of a confirm entity data.

‘yes’: {
 synonyms:[‘yeah’,’yep’,’yup’,’aye’,’sure’,’indeed’,’true’]
 },
 ‘no’: {
 synonyms:[‘nope’,’cancel’,’maybe’,’negative’,’nah’,’false’]
 }

Grabbing a list of countries isn’t a problem with a Google search but you need the synonyms. I found the moment timezone module had a list of countries which I used.

Unexpected input from a user we can be logged and manually added to synonyms to the list. Machine learning (ML) can be added in later to automatically add to this list. Unit testing is critical if the bot is going to start learning — but for now I’ll teach it myself.

I made a list of intents that would be complicated.

  • Calculator — 1 + 2 * 5
  • Time zones — time in new york
  • Currency — 50 USD to Baht
  • Rock, paper, scissors — Simple contextual game
  • Survey — A contextual conversation with questions and answers

I’m not going to worry about small intents like cat facts, greeting or gratitude. If easy intents are hard to code then I have a problem. My mission is to make intents quickly and trust the framework to do its thing.

Fun Intents are fun to put together in your “down time”.

Dice rolling in Slack!

Ignoring sessions and context, there are three stages to identify the intent needed. These three stages use their own classifiers. If a match is found we return the result. It’s possible to add new stages based on priorities / boost values at a later stage for more accuracy if needed.

  • Strict matching — Regular expressions and string matching
  • NLP — Classify.js
  • Fallbacks — Classify.js
Request work flow to calling the intent

Strict matching is a home brewed classifier — there is no NLP. The intent might require exact or close to exact input. How is NLP going to handle an input of 1+2? Calculator was one of the first intents I started to write. I use Google all the time for calculations so I wanted to get this working. With some searching I found some great regular expressions which would match common calculus inputs.

/^(calc )?[\d\+\/\*\.\- \(\)=]*$/

If the input matches the regular expression the input is sanitized eval’ed (owch!) and the result is returned.

There is no reason for NLP to catch these types of inputs. Putting a probability score on “5+5” confuses me. We don’t need big data to route this either, but I’m sure someone in the Google labs has figured that out.

Strict matches have helped building administration intents. There are a collection of admin commands with strict matching so I can monitor the system. I’m still reminiscent of IRC commands. Before strict matching admin intents could get called accidentally via the NLP.

I let my wife loose on the bot and she instantly said “Good night” to it. The bot thought she was saying she felt good (HowAreYou Intent) so it replied “Happy to hear!”. The Parting Intent now uses strict matching including an entity with a long list of common phrases like..

  • Goodbye
  • See you
  • Sweet dreams
  • Farewell
  • Bye

These words were gradually built up from searching around Wikipedia and chat logs indexed in Google.

We have more important intents for productivity and I don’t want small talk to train the main NLP engine and start to derail what users want to do.

NLP is an area I am still learning, this section will be light. I would love to try and code my own NLP but there is some heavy maths involved. A quick search through the NPM module library returned some results.

I’m using Classify.js which is based on Bayesian probability. I can’t find the original source of this script, if you know please send me a message! Classify.js takes the input and matches on the intent learning data. It’s only 200 lines of code, well documented and with a few tweaks it was doing what I needed. I have experimented with a few others but I have found Classify.js returns a really good probability score for what I need to do.

I’m not adverse to change the NLP library. The system is designed to interface quickly with any other libraries with just two methods, train() and find(). With unit testing to test the intents I can be more confident when trying new NLP libraries.

Classifier.train(intent_name, keyword_from_entity);
Classifier.find(user_input);

I’m finding the NLP works better when filtering and scrubbing the entity data and user input. Do not get the raw data and feed it in. This is probably because my data sets are low, I imagine when you’re using true big data this becomes less of a problem. Lower case everything, remove two letter words if you don’t need them and remove anything else if you don’t think it’s useful.

Cleaning the user input is becoming critical for NLP to get a good accuracy. There are many libraries (probably too many) that manipulate strings. I have started to build up my own library for full control. Remove grammar, brackets and change “what’s” to “what is” (contractions). In most cases we don’t need words like “the”, “it”, “is” so I borrowed the stop word dictionary from MySQL — just make sure you don’t remove ‘it” from “biscuit”. I’m finding myself on Wikipedia pages to understand languages to fix particular problems, you can probably tell my English isn’t great.

Clean data in, manipulate the hell of out it, clean data out.

I’ll write more on NLP when I’m deeper into problems.

Fallbacks are fun! When people are introduced to a new bot they always ask the bot questions. It’s a bit boring of the bot to reply, “Sorry, I don’t understand what you said”. I’m building a closed domain bot, I’m not trying to make a chat bot — I want it to serve a particular purpose. Create some intents for common fall backs.

  • What
  • When
  • Where
  • Who
  • Why
  • How

The 5 W’s, and the 1 H. They have an array of replies and the job is done!

Does the job, kind of!

Unit Testing

I have the ability to turn off all intents apart from one so I can focus on it. As soon as I turn the other intents back on bot flies off in the wrong direction (again) and asks how my day has been instead of giving me the weather!

I want to know the intent I wrote yesterday is still responding correctly to user input today.

Unit tests to the rescue! I’m not going to bore you with TDD.

Each intent has a hash of tests. The tests have input text and expected output if required.

tests: [
 { input:”calc 1+1" },
 { input:”calc 666 * 666", “expected”: 443556 },
 { input:”calc 666 * 666 + 10", “expected”: 443566 }
 ]

With a bit of hacky code using Jasmine I was able to load the server, connect an agent, collect all the tests from the intents, execute the inputs and validate the data. The server doesn’t reply with text, it replies with JSON and I can compare the intent I’m expecting with the result from the test input.

Jasmine is great but making dynamic “if”’s for each test and controlling the time out needs some core improvement.

In the first example (calc 1+1) we check the correct intent was called. An expected value are not required for all intents. Your cat fact Intent might return some random text.

Some cats, males in particular, develop health problems if fed dry food exclusively.

I can now safely write more intents and not worry too much about older intents breaking without knowledge of it. But they will break.


Sessions and data based on an API call to a remote server

Javascript and asynchronous programming. Fun! (sigh).

In my first prototype no other input could be entered into the bot until the previous call had finished. This was terrible design and required a total recode of the framework one evening.

The bot needs to handle consecutive requests, a queue system handling requests is needed. You can control consecutive requests and time out if those requests have taken too long. It’s hard to catch all errors in a Promise, especially if your promises are chaining and go deep. Timing out a request is a good fallback if you miss catching an error and don’t want to freeze up your bot.

I’m not just building a bot for fun, I’m actually building it for a startup called Devi. Devi is an office management system. It’s a tool for modern SME’s to make their office run smoother. There was no intention to give this system a bot — but the more I coded the more I found value. I want to talk to the bot on my travel to work. Add a task for my admin, approving an employees holiday request, log time sheets, generating a report and updating the company polices. Useful horizontal productivity commands like currency conversion and time zones. I feel like I have a tool in my hand that helps me as a business owner.

Sessions and stateless servers are interesting. You need a mix of functional programming along side your OO. We don’t want to kill the memory and deal with scaling problems later. Passing objects around the framework is going to be necessary seldom done.

PromiseJS has helped but it can become a mess quickly. The built in NodeJS EventEmitter will often do the job. The client cannot wait for the reply, you need to push the reply back to them.

Flood your bot with 10 commands one after another targeting different intents and make sure no data gets mixed up. Some intent calls that use external API’s might take longer to generate a result than others. Unit tests should check session data and requests don’t cross over.

Multiple different users from different companies can use the bot, we cannot mix up their data.

“Check Jack’s holiday leave”

Our intent here is vacation. But if we’re going to do an API call we need to know who Jack is.

“Show me Jack”.

Or just…

“Jack”.

Here lies a problem. We can load in a list of common names like David, John, Alice, Bob, and try our best to route this to an intent. But I work with Thai people with interesting names like, Wan, Jib, Tan, Aon, Pond, Oak so this isn’t possible and not reliable for this system.

Asynchronous calls are essential to load in entity data, train the classifiers and parse parameters. If the user has identified with the main API via the bot we fetch the employee list and train a private classifier to that account.

With a fast API and some smart caching you’re good to go. Just make sure to have some clean up methods.


Parsing parameters and using them within intents

I want to find the hardest parameters to parse from user input. Parameter parsing tweaker will be my full time job soon.

Workflow of parameters being extracted from user input

“50 USD to GBP”

We have three parameters. Amount, Currency From and Currency To.

Each intent is looking for the parameters. In the case of currency my intent is configured like so.

“amount”: {
 name: “Amount”,
 entity: “Common/Number”,
 required: false,
 default: 1
 },
 “currency_from”: {
 name: “Currency from”,
 entity: “Common/Currency”,
 required: true
 },
 “currency_to”: {
 name: “Currency to”,
 entity: “Common/Currency”,
 required: true
 }

From the intent I can call request.parameter(‘amount’); to get the entity data.

We’re using two different entities. Amount and Currency, and we’re using the Currency entity twice — first to get USD then GBP.

You might think, great, I’ll just tokenize it by a space!

“50USD to GBP”
“6m to km”

You can clean this with a regular expression but this isn’t going to work all the time so a solid solution is needed.

When I first coded this I was simply searching for the entity word with indexOf then returned the entity data key it matched. It looked good initially but if GBP was in the entity data before USD I would end up with Currency To and Currency From in the wrong order. We need to keep checking the entity data until the lowest match is found. It has found GBP but is there another match in the string before this index? It works!

“765min to hours”
“Sorry, I cannot convert meters to hours”

Then it didn’t work.

“m” for meter matched the first index position of “min” of 3. Tokenizing would help here but I still want to avoid it.

I am too scared to share a snippet of this code until it’s cleaned up but now I build up a list of all matched words with their position and then score each match.

  • The lower the position found the higher the score
  • If the matched text begins and ends with a space + 0.3
  • If the matched text ends with a space + 0.2
  • If the matched text starts with a space + 0.1

I get an array with some floating point numbers for each entity word matched, sort by highest then return the key.

[ { position: 1, string: ‘min’, value: ‘min’, score: 1.4000000000000001 },
 { position: 1, string: ‘mi’, value: ‘mi’, score: 1.2 },
 { position: 1, string: ‘m’, value: ‘m’, score: 1 },
 { position: 4, string: ‘s’, value: ‘s’, score: 1.2 },
 { position: 2, string: ‘in’, value: ‘in’, score: 1.2 },
 { position: 9, string: ‘hours’, value: ‘h’, score: 1.1 },
 { position: 9, string: ‘h’, value: ‘h’, score: 0.6 },
 { position: 9, string: ‘hour’, value: ‘h’, score: 0.6 } ]

“Hour” is in the string but it ranks low because it’s at the end of the string.

It’s working and I have some code I can tweak. I may increase the score based on the length of the word matched. I know this logic will require a few refactors until it’s accurate. NLP probability scoring just won’t work here, we’re dealing with numbers and very short strings.

Looping the parameters and then removing what was matched is really important.

“50 USD To GBP”
50 usd to gbp”
usd to gbp”
“to gbp
“to”

Amount = 50
Currency from = USD
Currency to = GBP

The bot working in our Hipchat test channel

Contextual conversations

A survey intent for flow based conversation can really help to build up the contextual methods.

For a survey we want to add some questions, get responses, store those responses, ask more questions and finally output the answers. We need to store information on the users session.

We can put this on rails, just like Letz does. If the user tries to go off and answer “Hello” when we are expecting a confirmation then we can force them back to the same intent action to ask the question again.

Letz app only shows the text input when entering a task description

request.expecting({
 intent: this,
 entity: ‘Common/Confirm’,
 force: true,
 action: {
 ‘yes’: ‘what_sport’,
 ‘no’: ‘watch_online’
 },
 save_answer: {
 ‘name’: ‘Watch sports on TV’,
 ‘key’: ‘survey.sports_tv’
 }
 });
 return ‘Do you watch sports TV?’;

In this example we are telling the request object what we’re expecting to be entered by the user. This data is stored in the users session and it’ll be loaded in again when they do another input. In theory if the same user is talking to the bot via Facebook and then uses Slack they will still be in this flow conversation — but this is terrible UX.

Simple flow based conversation with the bot

The entity Common/Confirm has a list of yes and no synonyms. If the next input is matched the intent action for either yes or no will be called next. They will progress though the intent. With force set we’re essentially putting them on rails so they can’t do anything else other than answer the questions. On Facebook, Line and other chat platforms we can attach quick buttons to the message so the user doesn’t need to type an option.

At the end of the survey we collect all the saved data from the session and output back to them.

This will be used for registration user on-boarding. Asking the user their email address, name, job position then taking them on a journey to invite their employees and set reminders. At each step calling a remote API to save the data gradually.

Parameters previously entered can also help with contextual conversations.

“What is the time in Bangkok?”

“What is the time?”

The first input has Bangkok as a location entity. If the parameter is found we can store this in the users session and load it back in as a default value.

“location”: {
 name: “Location”,
 entity: [“Common/Country”,”Common/City”],
 required: false,
 action: ‘specified’,
 from_user: true
 }

This intent parameter loads two entities, both Country and City to populate the ‘location’ parameter. Being able to include two entities for one parameter is essential so you don’t need to have exceptions to check if the city is set or if the country is set.

“from_user” is a terrible variable name but it tells the parameter module if the Country or City wasn’t found the check the users session data and use that as a default.

Going any further with this is something I have not experimented with yet.


Memory!

“Don’t optimise too early.” This is something I have always kept true to. We need to make sure we’re not loading up too many objects per user request. With a well designed framework and a clear flow from end-to-end of each user request the only thing you need to worry about is the amount of entity data.

Avoid caching until it becomes a problem. If you’re trying to handle caching early in development you’re going to be battling with it before you even get into the juicy parts of the bot.

Memory issues will become clear when we have a lot of users so all we have at the moment is best practices and no real information.


Putting it (almost) all together

Once the challenging entities, intents and asynchronous parts of the system are done the flesh can be added.

  • Queuing — if a user accidentally pastes 100 lines into the bot
  • Response — returning data back to the user in JSON
  • Server and Agents — Accepting and validating connections
  • Utilities — Scrubbing text and inflectors
  • Unit testing — TDD testing with Jasmine
  • Deployment — Capistrano
  • Monitoring — Not decided yet!
  • Logging — Incoming, errors, system messages and unknown commands
  • Config — For development and live environments
  • Action / Quick links — Various platforms allow you to add buttons
The same bot running on Facebook and Line with contextual buttons formatted for each platform

Agents for the platforms are required. These act as proxies and often need to be on SSL for webhooks. But once the agents are coded they won’t need to be touched again unless I want to add more rich content like a dialog.

The brain is now broken down into multiple apps.

  • Admin — Admin commands to check the health of the system
  • Common — 5 W’s, errors, and lots of generic entity data like cities
  • Devi — Custom API code to talk to the office management system
  • Fun — Small talk, roll dice, cat facts, etc…
  • Productivity — Calculator, weather, unit conversion, time zones…
  • Test — Surveys, ordering food…

For Facebook and Line a webhook agent can be created. The agent will listen to a port accessible via a URL, detect the platform (either Facebook or Line) then send the information to the brain. When it receives information back from the brain it will send it back to the user.

Slack and Hipchat support slash commands which use webhooks. It’s also easily possible to create standalone bots like IRC bots for these platforms but they require a socket connection for every user who wants to use them. This will require some expensive AWS hosting, support and maintenance. I have heard a medium sized AWS instance will handle about 50 bots.

For servers socket.io is easy to get running with. To build a client socket.io-client fits in nicely with the stack.


Not finished yet!

The bot isn’t released yet. It’s running privately on Line, Facebook, Slack and Hipchat but to make this bot really work I need to build up the intents for Devi.

A brief list of what needs to be done…

  • Scheduler and alarms
  • Monitoring and health
  • Parsing date inputs (this is going to take time!)
  • Lots more Devi Intents to make this bot have value to business owners

Update Dec 2017!
It’s been a while but the bot now has a name, “Good Intentions” and is open source. Check out the documentation with a live example of the bot!
http://goodintentions.firecreekweb.com
I will write up a new article with more findings and resource links soon. The biggest challenge still remains around routing the input to the correct intent.


Final Thoughts

This is not easy.


“Devi, who is the most handsome them all?”

“Sorry, I don’t understand what you said”

Like what you read? Give Darren Moore a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.