Building a useful Messenger chat bot for startups

Published in

Unboxd

9 min readDec 9, 2016

Approximately 114 years after the first science fiction movie was created, we can finally say that bots have taken over the world. Alexa brings me the latest news in the morning, Siri bothers me when I press her button too long and Google Allo is able to hold a more eloquent conversation than the president-elect of the United States.

A significant part of this uprise has to do the growing popularity of chat bots. Popular chat platforms like Kik, Telegram and (Facebook) Messenger now all have APIs to create rich chat experiences managed by bots. Hence, many conversational AI startups have been popping up, and many existing software packages have created some module to support chat experiences. At Unboxd, we’ve spent hundreds of hours researching and testing many of these bots, while trying all of the AI cloud services to build bots on. What we’ve learned is that most bots are not super useful, and most cloud services are not super simple. In this article, I will try to cover what we’ve learned while building our own bot for collecting social proof.

API.AI, LUIS, Watson, Wit and Lex

When we decided to create a chat bot, my first thought was to play around with conversational AI services in the cloud. I’ve worked with natural language processing before and it’s not something you want to build yourself when you’re looking to build a functional product. An AI service in the cloud is able to translate a user’s text message into a piece of data containing a user’s Intent and all data Entities that a user provides in his message. For example, “I want to buy a new dress for my girlfriend” might be translated into intent=buy, product=dress, gender=female (or gift=true), depending on how you configured the AI.

Our goal was to create a bot that makes it easier for people to leave reviews, feedback and testimonials by talking to a bot instead of having to open emails and filling out boring web forms. The latter is just not something millennials would waste their energy on. Additionally, we want people to be able to share videos, emojis and pictures. Chat is the way to go.

The main players in the cloud field of conversational AI are API.AI (recently acquired by Google), Microsoft’s LUIS, IBM’s Watson, Facebook’s Wit and the new Amazon Lex. When we started creating the bot, Amazon Lex wasn’t available yet, so I’ve only tried to other four.

After trying all these services, I must admit that they’re all pretty good. They all implement a combination of natural language processing and machine learning (deep learning). This means they will not just understand language, but also learn how to understand people better from getting more example input.

Some of these services integrate directly with a whole bunch of chat platforms, like Messenger, Slack, Telegram and so forth. It theoretically makes it super easy to create one AI bot and spawn it on many platforms. Especially API.AI has a very easy way to just enable all these chat platforms to create a cross-platform bot without any programming. But practically, there’s a lot of stuff to consider that you might not immediately think about.

The pain of sending tappable attachments

One of the most beautiful things about chat bot conversations is that the bot can send lists of clickable items, buttons and other interactive content back to the user. The user can then simply tap on a button to perform an action in the chat.

Most cloud AI services don’t support these attachments automatically. As far as I know, only API.AI offers support for easily replying with platform-specific attachments. This generally means that you’ll have to find open source libraries or build your own framework to handle the dispatching of messages between chat platforms and AI services. You basically have to throw the concept of all chat platforms automatically connected out the door, and build everything via APIs. As said, API.AI is an exception to this.

Chat platforms like Messenger send messages to other servers via webhooks. Generally, tapping a button causes a different webhook event to be fired than sending a plain text message. On Messenger, when a user taps a button, Facebook sends a message to the webhook with a “postback” parameter, instead of a “message” parameter. A postback parameter contains the payload that you assigned to the button beforehand, for example “buy-product-12345” or “Buy product 12345”. This is very useful in order to detect without doubt what a user has tapped.

The pain of sending these postbacks to the AI service is that you have to construct the payloads in a way that the AI service will understand without doubt. The AI service will try to recognize intent based on text, so you’ll have to send something like “Buy product 12345” and make sure that it understands that “Buy product X” would trigger some checkout message.

Alternatively, you could decide to never send these payload messages to the AI service and just handle them yourself. It’s not difficult to develop something that recognizes that payload “buy-product-12345” should initiate the checkout flow of product 12345. It’s more error proof than sending it to an AI service to recognize what “Buy product 12345” means, since the AI service will try to understand the text and might get confused with other instructions, like “I don’t want to buy anything” or “Buy product again”.

The only way to make sure the AI service doesn’t get confused is by letting your AI service keep track of the context of the conversation. So depending on the previous question/interaction, the AI service will only consider certain texts/intents. Again, API.AI is pretty good at this, but it’s still much more complex than handling postbacks manually on your server.

The fact that each chat platform uses a different JSON syntax for attachments/buttons/interactions makes the whole setup even more complex. If you want to connect Kik, Telegram, Messenger and Slack, you’ll have 4 incoming webhooks with all their own type of message handling. Not all of these platforms provide the same functionalities. For example, Slack’s commands are handled very differently than a Facebook attachment.

As you can imagine, there’s a lot of server side logic to be built if you want to combine all chat platforms while supporting platform-specific attachments. It’s definitely possible to do everything with API.AI and then only using your server to construct reply messages, but it has limitations when you want to receive attachments from the user. I also found it quite complex to keep track of the entire context flow and debugging whenever something goes wrong. It’s a lot easier when you can write code and unit tests for these scenarios.

Video attachments

I’ve discussed how the bot could send attachments with buttons and payloads, but I haven’t touched on the fact that users can send attachments to your bot too. In our case, we want users to be able to send a video review of a product or company. If we use API.AI’s direct integration with Facebook, where Facebook communicates directly to API.AI and our server only gets webhook calls from API.AI, then video, location and other attachments won’t work. If we want attachments, we have to let Facebook post to our server and then proxy all the relevant messages to API.AI, just like we would have to do with all other AI cloud services.

To handle an attachment, we need to know the context of the conversation. Normally, you could let the AI service keep track of the context. But when we receive a video attachment directly from Facebook, we need to know the context in order to do something useful with it. Otherwise we aren’t sure why the user is sending something. It depends on what we asked him to send us. For example, if he sends a video in the “welcome” context, it’s probably just a mistake. This means we still need to keep track of conversation state on our own server. So we need to keep the AI service and our server in sync when it comes to context. Again, complicating things.

Chat UX

After testing many other chat bots, I wondered why these bots try to be so text oriented. The bots always try to understand a user perfectly. The user is asked a lot of questions and has to use his keyboard regularly. This is definitely super sexy and all that, but doesn’t necessarily contribute to the user experience. I think many bots want to do too many things, or show off their AI too much. Our bot should only collect people’s opinions, which we want to have in a certain format. We don’t care so much about understanding the user and pretending to be super strong AI.

We decided that a tappable flow makes more sense to collect opinions from a user. Our bot is basically a waterfall of tappable buttons inside the chat. A user only has to tap like a mad man and write a piece of text once or twice. We use animated gifs of cats to make the user feel at home and emojis to make them express themselves in the way they can relate to the most.

For this use case, we basically don’t need a lot of intelligent AI. We could try to understand all the things that a user is typing with all the added complexity, or we could simply direct the user through our waterfall of choices in a fun and easy way. We chose the latter, which significantly simplifies our whole technical setup, because we don’t have to deal with interpreting too many messages from the user. Additionally, we decided that Facebook Messenger is our most important platform and we would therefore focus entirely on Messenger to begin with.

Designing an appropriate architecture

Most open source chat bot-related libraries and projects are built in Node.js. Therefore, I chose to use Node.js for our bot setup. Since I don’t feel like building generic logic for receiving webhook messages, I chose to use the messenger-bot npm library. It’s a lightweight library that has event listeners for receiving messages, postbacks, authentication requests, message delivery notifications and other Facebook message types.

On top of that, we’ve built a message dispatcher that handles all postbacks and messages, and sends them to the appropriate message handler object based on the context of the conversation. Whenever someone sends a message to our bot, the messenger-bot event listener forwards the message to our dispatcher, who in turn will look up the conversation state and trigger a conversation state handler. Visually, it looks like this:

msg -> [1 webhook] -> [1 messenger-bot] -> [1 dispatcher] -> [n handlers]

Some example code might clarify the setup a bit more:

The first time a user starts a conversation, the state is undefined and will therefore trigger the “welcome” state handler. The Welcome object will send a message back to the user and will give the user a few buttons to tap on. Once the user taps on something, the postback message will be dispatched towards the Welcome handler again, because the user is still in welcome state. The Welcome handler will then decide to switch to (goto) another handler, for example the SelectProduct handler, which will update the conversation state to “select_product”. The SelectProduct handler will send its message back to the user and wait for a user to respond to that.

Each handler can switch the state to any other handler, making the whole model very dynamic. The handler directly communicates with the external AI service like API.AI whenever he doesn’t understand a message based on its postback payload.

A handler looks a little bit like this:

Generally, we find this setup very easy to use and flexible. If you want to create a new piece of conversation flow, you simply add a new handler and let it handle that state of the conversation. The handler can decide if it should call API.AI, or the database or another external API. This does not change the message flow. Even if your AI service like API.AI would reply with a new context, the handler can simply return a Goto with the new context and switch the conversation like that. It’s a very simple, but very powerful setup.

Purr the Christmas Cat

To thank you for reading and liking this article, we’ve created Purr the Christmas Cat! Just visit Purr on Messenger (@christmaspurr) to get a friendly and funny Purry Christmas message.

The bot created in this article is a free plug & play bot for collecting customer feedback. Written/recorded testimonials easily embed into any HTML web page. You can find it here.

Purr says: “Meow & Meow” (Like and Follow)

It would mean a lot to me and Purr if you would like this article and follow me on Medium.com. I appreciate it!

Questions?

Comment below.