Decoding Dialogflow by coding your own chatbot

Published in

SFU Professional Computer Science

9 min readFeb 4, 2020

Authored by Peshotan Irani, Ovo Ojameruaye, Aishwerya Kapoor

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/pmp}.

Chatbots are everywhere! They welcome you on many websites, providing quick answers to simple questions, are available 24hours a day, and yes … you use them to order pizza! Chatbots are cool for all those stuff and many more. If, however, you want to go beyond ordering pizza to peak beyond the veil, perhaps tinker with the machinery, and build your own chatbot, well, you have come to the right place! So let’s ‘bot’ right in!

Lets start things the old-school way….

What is a Chatbot?

A bot is an automated system that is designed to communicate with humans through internet. A ‘chat’ bot is, as the name suggests, a bot designed to ‘chat’ with humans, via voice (auditory) or text (textual methods).

Hmmm…….That’s interesting. But where did it all start?

Fun fact: Chatbots were originally called “chatterbot” which was a term coined by Michael Mauldin in 1994 to describe conversational software. The first ever conversational software was released in 1966: ELIZA , a Rogerian psychotherapist that simulated conversation using pattern matching and substitution. From there on out, we have come a long way today with companies like Google and their products such as Google Home, Google Assistant etc.

To build chatbots, you need an understanding of Natural Language Processing (NLP) which is the foundation on which Chatbots are built. NLP is, as you must know, a branch of Computer Science, which could be said to be an offshoot of Artificial intelligence. It also goes without saying that programming skills would be more than helpful.

But, how do they work?

Users interact with chatbots either through text or speech. When speech is used, a chatbot first turns the speech into text using one of the Automatic Speech Recognition technologies. The chatbot then analyses the text to understand what the user needs. Once the need or intent is understood, the best response is considered and delivered back to the user.

Types of Chatbots

We can fit chatbots into 2 broad groups based on how they understand what users need and then provide feedback:

Rule-based
AI-based

Rule-Based Chatbots

Rule-based chatbots follow a set of predefined rules for each possible state of the conversation. Rule-based chatbots understand user’s intent by scanning for keywords from the user’s input, then pulling a pre-configured reply with the most matching keywords or the most similar patterns from a database.

These chatbots are typically written in AIML (Artificial Intelligence Markup Language) using a PATTERN and a TEMPLATE approach, such that when the bot encounters the exact or similar pattern in a sentence from the user, it replies with one of the templates from the database. This approach is also known as pattern matching.

AI-based Chatbots

AI-based chatbots are either retrieval-based or generative chatbots.

Retrieval based chatbots: Similar to the rule-based models, all responses are predefined with retrieval chatbots. The difference, however, is in the process of going from the user input to returning a response. While rule-based models make use of pattern matching in providing a response, retrieval-based models utilize classification algorithms to obtain intent from the user input.

Classification algorithms naturally infer that a model must be trained to help it learn how to classify user input. The number of classes are limited and have been defined as labels from training logs. Once the intent has been classified and entity extracted, this is used to pick the appropriate response. The context can also be considered when forming a response. The rules are implicit — they are learned by the AI system.

Generative Models: Unlike the rule based and retrieval models, these models generate response based on the user input and previous context of the conversation. They generate a reply one word at a time by computing probabilities over the whole vocabulary of the model. They are also capable of continuously learning from interacting with the user.

As deep learning advanced, generative models built with end-to-end trainable neural nets began replacing rule-based and retrieval models around 2015. More specifically, the recurrent encoder-decoder model and variations of it are seen to have dominated the task of conversational modeling.

Intents, entities, recurrent neural networks. What are these really about?

Let’s first understand some jargon before we get our hands dirty:

Utterances: Anything that a user says can be regarded as utterances. Think of it as the input.

Responses: The reply to user’s utterances. Anything that a bot says is a response.

Intent: The user ‘intent’ refers to what the user ‘intends’ to do while interacting with the chatbot. Think of them to be like ‘topics’ in a particular chapter of a book. For example, the customer says ‘I want to order a pizza’, then the intent is order_pizza for the bot. The number of intents a bot has depends upon the application the bot is made for.

Entity: Entities are one level deeper than intents. They act as metadata for intents, describing in detail what in particular the user’s intention is. They can represent quantities, counts, names etc. For the above pizza example, the user can say “I want to order two large vegetarian pizzas with alfredo base”, it says more about what kind of pizza the user wants. There can be multiple entities inside an intent. Here, ‘two’, ’large’, ’vegetarian’, ’alfredo base’ can be treated as entities.

Seq2Seq Models: The generative chatbot is powered by AI. This approach is typically based on recurrent neural networks. This tends to be quite an advanced topic but we will do our best to illustrate with a basic example. A variation currently used are Sequence to Sequence models which can be implemented as two LSTMS (called an encoder and decoder).

The encoder’s job is to take the user’s input and previous message as a sequence of word vectors and produces something known as a thought vector which essentially captures the meaning and represents the whole input sequence. The decoder converts the thought vector into a sequence of word vectors (the response). Each word in the sequence is generated based on the previous word and thought vector. At each step, the model outputs the word with the highest probability.

A cool resource on the nitty gritties of Seq2Seq models can be found here. Training the model gives the chatbot the ability to learn the process of generating relevant and grammatically correct responses to input utterances.

This all sounds so interesting, how can I actually build one?

Building a ChatBot

Chatbots have traditionally been challenging to understand and build. Google has simplified this process with its tool Dialogflow. Dialogflow lets you build conversational interfaces by providing a powerful natural language understanding (NLU) engine to process and understand what your users are looking for.

Before we dive right into it, some of these terms come up often while we talk about chatbots in Dialogflow, so why not get that out of the way.

Context: Context refers to the user’s intent in general. Related intents put together make up context. Contexts are used to determine the flow of the conversation.

Agents: In DialogFlow, an agent is the heart of the chatbot. It is similar to a human call centre agent. Agents are NLU modules that process inputs given by a user. They are top-level containers for all intents, entities and contexts. An advantage of using DialogFlow is the set of pre-built agents provided for generic use-cases.

Confidence Score: Chatbots input utterances and calculate intent. Their classification of intent outputs a confidence score demonstrating how confident the machine learning model is of correctly understanding the user’s intent. We can threshold the confidence score to handle cases and optimize intent classification.

Intent classification: In order to classify the user’s input as a particular intent, the text is first pre-processed to convert it into numerical format which can be understood by the ML model. Pre-processing could include but are not limited to conversion to lower case, tokenization, stopword removal, stemming and vectorization of tokens.

The classification can be done with various algorithms ranging from Multinomial Naïve Bayes, SVMs to using Neural Networks. The classifier is trained using labelled conversational logs where different sentences have been mapped to intents.

Entity recognition: Extracting the entity encompasses a technique known as Named Entity Recognition. NER locates and classifies named entities in text into predefined categories such as the name of a person, location, organization etc. It is very strongly dependent on datasets and heuristics (e.g capital letters, question marks).

Enough theories!!! Show me how to code!

Really….Building a Chatbot

Finally we get to the part where we build a chatbot using DialogFlow, Node.js and React.js. The basic architecture of our app would look something like this.

We will have front-end user interface where the end-user inputs their queries, which is passed on to the backend-app which communicates with Dialogflow using an API. The responses are then passed on from Dialogflow to the backend and finally displayed to the end-user.

The process of building our chatbot would follows these steps:

Training the agent on Dialogflow: As mentioned previously, one of the basic building blocks of Dialogflow are agents. We can create our agent from scratch or use intents from a pre-built agent (there are a lot of pre-built agents including small talk agent, weather agent, translate agent, etc.). Once we have an intent trained, the bot would know what to reply when it sees a certain kind of an input. There is also a fall-back intent which is basically used by the agent to fall back on when none of the intents match the incoming input. The responses from the bot can be a simple text message or a custom JSON object (we will discuss this later) that can be used to display a variety of responses.
Create the backend-app and React-app: Next step is to build the backend app using Node.js and make post requests to the Dialogflow agent, these responses are then rerouted to our React app. Our React app will have class based and functional components that will display appropriate responses.
Building customized reply with cards: For certain inputs we ask Dialogflow to send JSON objects as responses, these responses can be then parsed to create a React card like component and display the output. In our project we have created an intent on when a user would input `which is the best masters program in CS in Canada` it will respond with a customized card with a link to SFU’s MS in Big Data program (Inserted our human bias here). The response would look something as shown below.

Agent uses different intent based on user utterance

Deployment: Finally we deploy the app on using Heroku. Our live chatbot can be found on this link (Please open using chrome on a desktop. Our bot is currently not mobile friendly)

We know this was fast paced and the entire git repo with source code can be found on this here. In our next series, we dive deep into the bolts and nuts of building a chatbot.

https://imgur.com/gallery/wftRv/comment/716004017

References

[1] Building Chatbots with Python Using Natural Language Processing and Machine Learning, Sumit Raj, ISBN-13 (pbk): 978–1–4842–4095–3

[2] Jana Bergant’s Udemy course on building ChatBots Using DialogFlow and Node.js

[3] A Survey on Evaluation Methods for Chatbots

[4] Header Image: https://www.freepik.com/free-photos-vectors/technology">Technology vector created by upklyak — www.freepik.com