Capstone Project: CLERKbot — A virtual assistant chatbot

8 min readAug 13, 2020

This article was written as part of a capstone project for Data Science & Machine Learning Immersive course at Xccelerate. We discussed our chatbot in detail including the framework, architecture, and design behind our bot, as well as some challenges we faced in the process.

Preface

The idea of creating a virtual assistant chatbot first came to mind when we were brainstorming ideas for our capstone project. This sparked our interest for a few reasons:

1. We wanted to create a product that we would genuinely enjoy using and that it can make our lives easier in some way.

2. Making a chatbot is a relatively new concept to us but it is one that we are keen on learning more about because chatbots are fun to use.

3. We have the impression that the current market has a high demand for chatbots and its popularity is exponentially growing over the years.

For the reasons given above, we began our journey to becoming self-proclaimed chatbot experts.

Problem Statement

Upon further research, we learned that the chatbot industry is continuously growing and that chatbots have become a necessity in industries like finance. The accessibility and ease of a customized chatbot as well as the increased utilization among messaging apps have resulted in an inevitable growth in the tech and chatbot industry[1]. However, the problem we face today is that while chatbots can increase efficiencies on both business and consumer ends, there are still some technical limitations that hinder customer service experiences such as slow response time and poor connectivity.

In this project, we aim to develop a chatbot that can answer questions at any time of day, effectively addresses user’s requests, and able to drive a good conversation that resembles a human. Instead of creating one for a particular enterprise, we decide to create a digital assistant that can help everyone out. Our motivation stems from the fact that we are constantly juggling several tasks at once and going back and forth with different apps for different things. A good virtual assistant can save time and make life easier for its user. Hence, we are determined to develop a chatbot that combines several crucial functions for daily tasks.

What is CLERKbot?

CLERKbot is a 6-in-1 chatbot that operates on Telegram. It allows users to connect to Google calendar, sets appointments, and displays reminders according to the schedule that day. It can activate Google Maps when users need directions to their destinations. Additionally, CLERKbot offers weather forecasts, searches for restaurants nearby, and provides a list of locations that patients with COVID-19 had visited in Hong Kong.

How did we build CLERKbot?

We used Rasa by using Rasa Demo Bot to ensure the chatbot answers like a human being.

We used Google Calendar API to gain access to the calendar in order to add events to it and avoid time conflict
We used Google Geolocation API to get the user’s current location for location-related services
We used OpenWeather API to acquire the weather forecasts in the user’s location
We used the Hong Kong Government’s API to obtain the COVID-19 locations

We used ngrok to create a connection to our local computer that is publicly available on the Internet and deployed it on Telegram.

How does Rasa work?

Rasa is an open-source machine learning framework that is great at building contextual chatbots[2]. With Rasa, CLERKbot can learn to respond to phrases and understand the core meaning of words inside of phrases.

Rasa is comprised of two main components:
Rasa NLU and Rasa Core

Rasa NLU, aka natural language understanding, is the “ear” of the chatbot. It comprehends the user’s messages, determines the intent, and extracts entities from the message.

Rasa Core is the “brain” of the chatbot. It is a chatbot framework that manages the flow of the conversation, holds meaningful conversations with users, and decides what to do next.

Keywords Used In Rasa

Intents: Purposes or goals expressed in a user’s input. Once CLERKbot recognizes the intent of the user, it will proceed to an applicable next action.
For example, if a user types “I want to update my calendar”,
the user’s intent would be “request_create_schedule”.

Entities: Information in the user input that is relevant to the user’s intents. Given the example above, the entities would be:
“event” : “Meeting with John”
“location” : “central”
“date” : “2020–08–17T09:00:00+08:00”

Stories: Sample interactions between users and CLERKbot consisting of user intents and actions taken by the bot.

Actions: Operations performed by the bot either asking for more details to fill the entities or integrating with APIs.

Let’s explore more about how can Rasa recognize the user’s intent and extract the entities in the coming paragraphs.

Rasa NLU

We grouped examples by intent and assigned custom entity labels to words so the bot can define what a “location” and “summary” is.

## intent:inform
- Help me to create a [meeting with John](summary) on Wednesday at 2pm
- [Mums Birthday dinner](summary) on March 9 at [Hotel Icon](location)
- I want to schedule a [doctor’s appointment](summary) on Sunday morning in [Wan Chai](location).

In Rasa, incoming messages are processed by a sequence of components. These components are executed one after another in a so-called processing pipeline. There are few components included:

Tokenizer — to split the sentence into words
Featurizer — to transform the tokens into features that can be used in ML algorithms
DucklingHTTPExtractor — an entity extractor for dates and numbers
DIETClassifier — both an intent classifier and entity extractor

DIETClassidier is a multi-task transformer architecture that can perform both intent classification and entities recognition together. It is an important part of our bot to understand a natural language by capturing relationships and the sequence of words in sentences.

Rasa Core

Rasa Core decides what happens next in the conversation by providing different story paths, which is called dialogue management. Rather than using lots of if/else statements, it uses a machine learning model trained on example conversations to decide what to do next.

## event create path 1
* greet
- utter_greet
* request_create_event
- utter_more_info
- event_form
- form{“name”: “event_form”}
- form{“name”: null}
- utter_confirm_schedule_details
> check_asked_schedule_details
## user confirm details + event created
> check_asked_schedule_details
* affirm
- action_create_event
- slot{“success”:”success”}
- action_set_reminder
- action_route_plan
- action_suggest

Furthermore, CLERKbot is not simply a FAQs chatbot because we defined our custom actions. We wrote custom actions in python to call external APIs. For instance, when the weather forecast action is called, the bot links to the weather API and retrieves the information as defined in our actions.py file.

Performance Evaluation

Before we evaluated the performance of our model, we had to make sure the training data is free of conflicts and errors. We used a story structure validation tool to check for any conflicting stories. The example above indicates an instance of conflicting stories. The model identifies the user’s intent (“request_create_event”) but is not sure whether to “utter_more_info” or to provide an “event_form” as they both come after the “request_create_event” intent. Therefore, we try to avoid misleading training data as such so that Rasa does not get confused with what to do next.

Cross-Validation

We checked our training data by using cross-validation to understand how the model performed.

The F1-score serves as a general ‘grade’ of the performance. It takes into account two metrics: precision and recall[5].

Precision looks at all of the messages the model identified as intent A and measures how many were actually intent A.

“Out of all predictions of A, how many were correct?”

Recall, on the other hand, looks at the total number of detecting A, out of all examples pertaining to A in reality.

“Out of all the examples in A, how many were detected?”

There are 17 intents and 20–80 user examples per intent in our model, the overall accuracy of recognizing the intent reach 85%; precision, recall, and f1-score are over 70%.

Confusion Matrix

The horizontal axis indicates the intents the model predicted. The vertical axis indicates the actual intents. The numbers shown diagonally down the grid are the number of true positives or correct classifications.

The confusion matrix shows how often a model made a correct classification. For example, here we see the “request_create_event” intent was misclassified as “ask_schedule”, “chitchat”, and “inform”. It is acceptable to be classified as “inform” as that leads to the same response; however, “ask_schedule” and “chitchat” will lead to different storylines.

DIETClassifier report

In the beginning stage of our model, the f1-score of location and summary are relatively low, 60% and 39% respectively. Though after we provided over 150 samples for each of them, the location f1-score reached 74%.

However, the summary f-1 score can reach 60% only. We tried adding more training data but we slowly realized that the ways of titling an event are too vast so it’s hard to detect a pattern. Therefore, the results for the summary were not ideal.

Conclusion

As people are always looking for ways to incorporate fun into boring tasks, we created CLERKbot to bring convenience in an efficient way but also a pleasure to talk to. We target those who prefer to text their assistant rather than saying commands out loud where everyone around you can hear.

As for future improvement, we plan on working on a few things: 1. We plan to deploy on more popular chat channels such as WhatsApp and Facebook Messenger. 2. Improve the flow of the conversation and response accuracy. 3. Allow users to connect to their own calendars. 4. Refine the functions of Clerkbot so that it can learn the user’s preferences and habits.

Due to the time constraint and the fact that this is our first attempt at creating a chatbot, we are fully aware that this chatbot is far from perfect and the room for improvement is endless. Please feel free to point out any mistakes. Any suggestions are welcome!