Chapter 11: ChatBots to Question & Answer systems.

Madhu Sanjeevi ( Mady )
Deep Math Machine learning.ai
13 min readApr 19, 2018

I am really excited to write this story , so far I have talked about Machine learning,deep learning,Math and programming and I am sick of it.

Now I wanna talk about simple things and also some research level stuff( AI Research in NLP) cause Natural language processing(NLP) is one of most complex problems in AI and it has long long long way to go.

Note: This is a series of stories which gives the complete idea about chatbots, Q&A engines and Language Models and This story is very much useful for entrepreneurs and developers than researchers.

There is a misconception by a lot of people , they think that Chatbots and Q&A systems are same or similar.

but in reality they are not ,they follow complete different approaches, methods, algorithms and models. except the input and output which is a text (I will definitely prove it to you by the EOS ).

Chatbots can have more functionalities( depends on the problem ) than Q&A systems and very easy to build now a days using tools like Diaglogflow.ai , wit.ai , LUIS , amazon lex and IBM watson and etc..

but they can’t be true intelligent, they just get the work done for the problem you take.

A Q&A system requires huge amount of data and expertise, still very hard to implement a system.

Ex : if you are building a chat bot using IBM watson, it might not have much intelligence(No offence IBM, infact all the frameworks are similar) but if you try to understand the technology ,methods the IBM watson used to win the game called jeopardy , you will be amazed.

here IBM watson is one of the best Q&A systems ( I am not talking about chatbot here cause for the chatbot, IBM watson is normal like any other tool).

I hope it is not confusing , even if it is.. you will get the idea as you read through.

Let’s first talk about Chatbots.

A Chatbot known as an Conversational agent is a service either powered by rules or artificial intelligence(little) that you interact via a chat interface.

Ex: A weather bot ( you can ask all weather questions ), news bot(a bot for those who like to keep up with the news daily) , Cricketscore bot and etc…

Before we dive into the topic, lemme just give an idea where you can build the bots and how.

Just observe the below picture, the 2X2 grid says it all.

Source : Chatbotlife magzine

Retrieval-based model

As the name says it retrieves the answers/responses from a set of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context. The heuristic could be as simple as a rule-based expression match, or as complex as an ensemble of Machine Learning classifiers.

Pro’s

1 . No grammtical or meaning less errors as we store the answers

2. Works 100% well for the business problems and customer satisfaction and attention can be gained

3. Super easy to build these models as we don’t require huge data.

Con’s

  1. These systems don’t generate any new text, they just pick a response from a fixed set.

2. A lot of hard coded rules have to be written so not much intelligent.

Generative models

these models don’t rely on pre-defined responses. They generate new responses from scratch. Generative models are typically based on Machine Translation techniques, but instead of translating from one language to another, we “translate” from an input to an output (response).

it uses sequence to sequence models for generating the text ( we will implement these also in the next stories)

(anyone could not explain better than this for the Generative Retrieved based models I took the exact to just to give the idea)

Pro’s

  1. No need to worry about the predefined responses and the rules.

Con’s

  1. Super difficult to implement these and the output may not be accurate (grammatical / meaning less errors may occur)
  2. Not applicable for the business problem (unless you are providing a service which may require text summarization techniques) #willexplain
  3. Huge data is required to train these models.

Open Domain

open domain is the place where the chat conversation can go anywhere, users can type/ask anything.There isn’t necessarily have a well-defined goal or intention.

the chatbot mitsuku is the example for this.

the convo can go into all kinds of directions. The infinite number of topics and the fact that a certain amount of world knowledge is required to create reasonable responses makes this a hard problem.

Closed Domain

closed domain is the place where you are solving a particular business problem ( The bussiness could be in any sector/industry )

ex : Pizza bot, Bankingbot, Medical bot, CricketScore bot etc…

closed domain bots focus on one particular sector or industry.

so you can’t ask questions like “how is the weather now??”, “what is the score for IndVsPak match today?” when you dealing with a banking bot or pizza bot.

similarly you can’t ask pizza bot a banking query. if you ask , you will get a decent answer “I am sorry I don’t understand”.

The closed domain bots have the limited functionalities/ services based on the business problem.

Note: In this story I only focus on the closed domains bots and I hope you get a picture about the chatbot architecture.

Now lets get continued with the chatbots.

A chat bot typically has 3 things in it

  1. Intent ( Intention of the query asked by the user)
  2. Entities ( Named entities in Query like , Location names, People names, date and etc…) #NamedEntityRecognization
  3. Action or Response ( the result to throw back to the user)

Ex: what’s the weather in Seattle tomorrow??

Here Intent → Weather check

Entities → Seattle ( Location), tomorrow (Date)

Response → “The weather in {Location} {Date} is so and so”

The chat bot has always canned responses depending upon the problem/service you provide.

Note : NLP is hard at this moment. Computers started generating text with the help of deep learning recently so it can’t produce a meaning full response so Chat bots always have canned responses ( For User/Customer services )

if you are not a programmer, there are a lot of chatbots frameworks where you can build a bot very easily without coding.

Chatfuel, Manychat, FlowXO, Octane, Recime and etc….

There are dozes of frameworks out there you can use any to build a bot for your business.

if you are a programmer,have a little experience with Machine learning and wanna build a chat bot for your services, you can use the following tools.

These tools backed by big companies like Google Dialogflow, Amazon Lex , Microsoft LUIS, Facebook Wit and IBM watson and perfectly works for small products/ services. (There are even more open source tools available on internet).

These services are mostly on cloud , through API’s you can access the results and the entire piece of code is a blackbox to you (You don’t have to know what machine learning algorithm is but you can pretty much easily use/train them according to your requirements)

You just need to need how API works , that’s it you can use these tools.

Every tool has their own advantages and disadvantages but all of them are mostly similar, one uses these tools based on the requirements.

as far as my experience is concerned, IBM watson is the safe and the best one. (actually wit.ai provides better NLU capabilities but it sometimes changes the entity values especially number values and date time values based on some training data patterns #rarecase but if you are dealing with bunch of numbers in your input data )

if you have some idea on machine learning and NLP , I don’t recommend you to use these tools as you can’t customize / tune the model here plus your data is out of your hands.

if you are an absolute beginner or not having much knowledge on machine learning, I suggest you to play with these tools and get some understanding, you can build bots easily then after.

if you spend an hour on internet/ youtube , you can pretty much understand these tools and be able to use these tools. Trust me Very easy!

So What is required to build a chatbot??

1 . First of all we need to have the clear idea about what problem we are solving ( this is the most important part, 90 % people fail here as far as my experience is concerned for last 2 years.)

so we have to prepare the conversational flows first.

so what is a conversational flow????

A conversational flow is a flow designed by us to drive the user to consume our services.

Chatbots always drive stories.

let’s say we have a set of services say 10, so we need to have 10 different conversational flows (Remember we want customers/users to consume our services so we need to manipulate users to control the flow of the conversation)

in other words, we need to fully take the control of the conversation.

I will explain in detail as we go. #keepcalm.

2 . Identify the intents , entities and responses of all.

3 . Collect the training data to build the ml model.

4. build the model (cloud platforms or customized tools ) and start coding based on the requirements.

That’s it.

Don’t worry! I will share the practical knowledge also.

Let’s build a chatbot flow very quickly.

Step 1: Understanding the problem and preparing conversation flows.

Let’s take a simple problem, Let’s say I open a Pizza shop (Mady’s Pizza house) my ultimate goal is to sell pizza’s and gain more customers so I want to build a bot for my customers for 3 services.

Conversation flow 1 : Order Pizza Now.

Conversation flow 2 : Price finding .

Conversation flow 3: Displaying best offers and pizza trends.

These flows should be developed in the way that you want to drive the users.

These are designed by me #justanexample.

if you observe the above images closely , you find an interesting point.

here, we are actually controlling the user’s actions and giving the responses according to the input ( that’s the important part in building chatbots )

I am not just giving an answer to the user , I am actually understanding the user and controlling the conversation so the user really feels like he/she is having a natural converstion.

If you can’t build intelligence You have to be intelligent.

Okay Step1 is done #uffffff. remaining things are super easy

Step 2 : Finding Intents and entities

I have 3 intents ,

  1. Order Pizza, 2 . Get price 3. Get offers/choices

Entities are

Pizza name, → Cheese Pepproni

Pizza size, → Large, Medium,small

pizza type → Veg or Non veg or both

Pizza/item quantity → 1 ,2 ,3 .

Item name → Pepsi

Item quantity → 1 lit,2 lit

Note: These are just for sake of understanding, I hope you see from that perceptive.

Step 3: Prepare the training data for getting intents and entities.

I know this is dummy data but i feel like it makes sense and here I am only labeling the intent, you need to label entities as well and i hope you get help from internet if you dont know#just search NER training on google.

Step 4 : Build the machine learning classifier and/or NER (named entity recognition ) to classify the input.

Here we get one of the intents for the user’s input ( OrderNow,Price or ShowOffers) and we get entities if found in the input.

if you use scikit learn or any cloud platforms to build ML model , they may use simple model called “bag of words” model. it’s not efficient so

I strongly recommend you to use deep learning algorithms like Rnn’s and LSTM’s for Intent classification. if you don’t know what they are , you may go through my earlier stories here RNN’s and LSTM’s

Write the remaining logic based on the output we get from the model

if intent is “OrderNow”
######if entites are found

#########Do the action here
######else
#########Ask user back for the entities

and so on…

ex: user says “I need pizza”

I get the intent(order) and no entities so i keep this intent and ask the user back for veg or non veg,

user replies, veg and i capture it as type and ask user for quantity and so on

finally i get all the values. i kept the intent safely so i pull back intent and based on the entities I give the response.

You may ask the question like,

How do you maintain the session???

That’s where your thinking abilities are useful #That’s so easy to think but difficult to explain as it requires another medium story. I may write later.

I really can’t explain further than this but i believe a programmer is able to grasp the remaining things.

and That’s it , and that’s how you can easily build the bots and of course you can add a lot of functionalities to it and lot of features also

one of the features is Conversational interface for the bots

so What is a conversational interface???

A conversational interface provides the buttons and images so users can just tap it to respond to the bot. here is the domino’s pizza bot

Here users don’t have to type anything, just a click is gonna get the job done.

In github, There are a lot of chatbots available if you are interested you can search there, and there a lot of tools also to build bots.

Ufff. we spent lot of time on chatbots, let’s stop it here.

Let’s talk about Question and Answering Systems

Q&A is one of my favorite research subjects in AI, Q&A system is my first project in deep learning 18 months ago and I am so excited to share my knowledge now so let’s get started.

#me #yearsago #justshare #dormroom #mumbai #india

if you ask what is Q&A ?

Internet can give you dozens of answers

from wikipedia

“Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.”

Another source

“Question Answering is a specialized form of Information Retrieval which seeks knowledge” and a lot you can find on internet.

for me

“Q&A is a program which can do reasoning based on the context, which can construct an answer from structured database/knowledgebase ,which can classify questions based on the training data and which can seek the information from unstructured collections ”

And Question Answering is in itself intersection of Natural Language Processing, Information Retrieval, Machine Learning, Knowledge Representation, and Sematic Search.

Just like we discussed above, QA has also open domain and close domain

Open Domain Q&A

People can ask any question, the QA system will find an answer from web or other sources and give the user the respective answer (the entire world’s knowledge is required)

Closed Domain Q&A

People can ask questions related to a particular domain Ex: Healthcare- Medicine, The QA system finds an answer from the database of that domain to give the answer ( Domain expert and Domain knowledge are required to build these QA’s)

since I am not an expert of any domain, I fully focus only on Open domain as we have huge datasets from Standford, Facebook, Google and many more to do the AI research in QA engines.

so We will build the QA engine using deep learning for Open Domain only in the coming stories. #justAnote.

Ok. There are different types of question and answering systems

  1. Information retrieval (IR) -based question answering

it fully relies on the huge amount of information available as text on the Web or in specialized collections such as PubMed.

The method processes the question to determine the likely answer type (often a named entity like a person, location, or time),

There are different types of questions also #willdiscusssoon

one of the types is factoid questions like questions based on the facts.

and answer type is a named entity.(Person, location, number, etc..) EX:

IR based gets the answer from the documents collected, again it does’not generate the answer, it just does copy-paste from the documents, so if the text is not present in the documents, these models can’t give the answer.

Here is the picture for full flow taken from standford for IR based QA

2 . Knowledge based Question and answering.

The core idea of KBQA is convert the natural language query into structured database query

ex: query = “When was Mady born??”

it gets converted into database query say SQL

SELECT born_year FROM testtable WHERE name= ‘Mady’;

it returns the answer back to the user.

3. Story based Question and answering

This is more like asking the question based on the passage/story given

here the input is given in form of triplets for building models.

(Story, Question,Answer)

There is another one also which is Deep QA (IBM watson). we will discuss this in the coming stories.

So far I have given high level details for Question and answering , we will go into depth in the coming stories #staytuned.

well, that’s all for the article.

If readers have any questions/suggestions/opinions/thoughts , please feel free to pass them.

I will see you guys in the next story, Have a great day…!

--

--