Chatbot Basics (and then some)

5 min readJul 15, 2018

15-July-2018

Chatbots are in the fashion these days. So many retailers (including insurance and banking retailers) and enterprises are adopting a bot-based responses for general enquiries, and some, even ordering on their websites and mobile channels. It obviously is much more convenient to talk in your natural language rather than filling a form or search through pages of technical write ups.

As a user, the chatbot feels like magic. As they say, any sufficiently advanced technology is akin to magic. If it’s programmed with the right customer experience in mind, it does feel very very intelligent — it understands our plain and simple language rather than a cryptic code.

However, under the hood, the trick is simply to use simple intent parsing with slick programming techniques. In other words, it’s just a long drawn if-then-else statement.

Typically, there are two steps to answer a question through NLP (natural language processing):

Step 1: Parse language to understand the intent and other variables
Step 2: Compute the response

The ‘intent’ is nothing but a generic name given for the kind of question. So all questions like ‘what is the weather tomorrow’, ‘is it going to be sunny tomorrow’, ‘what’s the temperature on 16/July’ etc. are all simply asking the ‘weather’ information for a specific date ‘16-July’. So in step 1, we generalize these questions into a data structure that a programme can understand. Something like = {intent=’weather’; date_time=’16/07/2018’}.

Now, step 2 becomes easy for any programmer — he/she just needs to write a function that returns the temperature for the given date and add some embellishments around it. So say, it returns {high_temp=’20C’; low_temp=’16C’; humidity=’40%’; chance_of_rain=’10%’; sky=’clear’}. The embellishments are chosen randomly from a given set e.g. {‘sunny’, ‘bright’, ‘clear skies’, ‘blue sky’, ‘dry’}; and finally concatenated with the response such as ‘it’s going to be sunny tomorrow, with a high of 20 degrees and a low of 16 degrees’.

The better these embellishments are, the less robotic your chatbot will be.

A note for voice bots: In case of voice bots like Amazon Echo or Google Home, just add voice to text conversion as Step 0 and text to voice conversion as Step 3. The process flow is still the same.

There are open tools available such as Dialogflow, Botengine, Spacy etc.. So, the process flow will be as follows. Obviously, these tools are quite powerful and can do quite a bit for you to simplify your function.

Typical Architecture of a chat interface

Intent parsing basics

How do Intent Parsing tools such as Dialogflow, BotEngine, Mycroft or Spacy work — well, that’s a question which needs a text book to answer. And trust me, not an easy one. Here’s my short explanation which hopefully will be able to give you the concept.

In essence, these tools parse an English sentence into grammar trees, extract key words (or entities) out of it (‘weather’, ‘temperature’, ‘sunny’, ‘rainy’), stem the words (‘sunny’ becomes ‘sun’, ‘skies’ become ‘sky’) — which also, depending on the model used, ignores some of the words like ‘is’, ‘the’, ‘a’, ‘an’ etc., runs this through a dictionary to understand the intent and/or sentiment and respond back with a probability distribution of the intent.

And you get this probability as a confidence value in the data structure that you receive. If the confidence is high (say 70%), there’s your intent, otherwise you simply say ‘Sorry, I couldn’t understand this. Could you please rephrase?’. And to minimize robotic response every time, you can have a lot of such sorry responses e.g. ‘sorry, please re-phrase’, ‘didn’t get that, could you come again?’, ‘pardon?’ and give a random one each time.

These are some tricks that you can use to pretend that the bot is really intelligent. In reality, your bot doesn’t really understand the sentence that you have. It is only responding based on the pre-formatted sentences. There are other models that are slightly more intelligent and can possibly understand what you’re trying to say.

Slightly More Intelligent Chatbots

Here’s one online tool that you can play around to get some idea of how some of the more intelligent bots work — Google Cloud NLP API. Examples are below:

For the input sentence: ‘I love this movie’, the word ‘movie’ is identified as an entity. The salience score is 1 (out of 1), which essentially means the entire sentence is talking about this entity only.

Also, note that sentiment score is 0.9 which is — it’s very positive. Contrary to this, ‘I hate this movie’ will have everything else same except for a very low sentiment score.

It understands slightly more complicated sentences as well: ‘this movie is so good, but the acting is so bad’, will give a 0.9 sentiment score on ‘movie’ and -0.9 on ‘acting’.

Understanding multiple entities and sentiments

So it kind of understands that you’re talking about.

While I can’t confirm this, I’m pretty sure that such APIs use some kind of a variant of Word2Vec model. This model uses a concept of cosine similarities between word vectors, which is simply to say that they understand that the words ‘man’ and ‘boy’ have the same relationship as ‘woman’ and ‘girl’.

Closing Notes

It’s quite fun to see how an algorithm parses our natural sentences. But soon, you’ll realize that it’s not really intelligent.

For e.g. if you type in the American way of saying I don’t like it — ‘I can’t care much about this movie’, the sentiment score is 0 = neutral, which is obviously wrong. And let’s try the British way — ‘I can’t say that I’m vastly impressed by this movie’, again the sentiment score is neutral.

While you can scoff at that by saying ‘no one talks like that’, the fact is there are lot of ways our language can be interpreted (the Finnish comedian Ismo will have you choke on laughter link1 link2). Here’s another example from the book Life 3.0 by Max Tegmark — what does ‘they’ refer to in these two sentences:

The city councilmen denied the permission to the demonstrators as they feared violence.
The city councilmen denied the permission to the demonstrators as they advocated violence.

The Google tool above gives the same scoring and syntactical evaluation of the two sentences, but we surely know that there is a difference intuitively as to who fears violence vs. who advocates violence.

While we have made a lot of progress, there’s enough and more work to do in this field. To quote Robert Frost, the woods are lovely, dark and deep, and I have promises to keep, and miles to go before I sleep.

Chatbot Basics (and then some)

Intent parsing basics

Slightly More Intelligent Chatbots

Closing Notes

Written by Vidhan Singhai