The Machine Learning Cycle in Support Chatbots (Part 2 of 2)

6 min readJul 17, 2018

In my previous article, I set out to describe the machine learning cycle of chatbots in two parts. In this post I’ll expand on my previous explanations, so make sure you have checked those out before.

Let’s begin with a quick recap: In Part 1, we saw how the flow of a support chatbot is generated: Either based on rules or dynamically. In Part 2, we’ll dive into how the flow is changed. To use an example from physics: If Part 1 was about speed, Part 2 is about acceleration (the changing of the speed).

Untangling the Confusion Matrix

In machine learning, we often use what’s called a confusion matrix to understand the performance of an algorithm. Wikipedia has a great article about it. Rather than repeating it here, I’m going to apply the findings to chatbots, which will help us assess them.

True Positive (TP): This is a prediction by the chatbot that matches the user’s true problem. In other words: The bot figured out what was wrong and found a correct solution. If you want to know the status of your order and a bot provides you exactly with it, you’re seeing a True Positive.
False Positive (FP): This is a wrong solution provided to a user. If you want to know the status of your order and the bot shows you how to change the delivery address, you’re seeing a False Positive.
False Negative (FN): Now it gets a bit more tricky: This term describes what happens when a bot has been trained to provide a problem’s solution but doesn’t provide one. If you want to know the status of your order and the bot stops the conversation to hand it over to a human, you’re seeing a False Negative.
True Negative (TN): This is the case if the bot hasn’t been trained the user’s solution and thus correctly hands the conversation over to a human. If you have a very complex problem with an order, the bot determines that human help is required and hands over the conversation, then you’re seeing a True Negative.

I like to explain the confusion matrix with the example of a hunter:

A True Positive happens when a hunter sees a deer, shoots and hits the deer.
A False Positive happens when a hunter sees a tree, shoots and hits the tree.
A False Negative happens when a hunter sees a deer and doesn’t shoot.
A True Negative is happens when hunter sees a tree doesn’t shoot.

Two Ways of Extending the Training Data

Let’s take a look at our model from the first article:

As you can see, the Trained Model relies on Training Data to make its predictions. (If you are curious to learn more on Training Data and Training a Model, I can strongly recommend this series of posts from Adam Geitgey).

The more Training Data you have, the better the model should become. This is where the True Positives come in. They provide the real-world samples that allow us to improve the Trained Model. There are two ways to do it: Automatically and manually.

The Elegance of Automatic Learning

(Disclaimer: I don’t know if Erwin can do what I’m writing here, this is just an illustration)

For machines to get better than humans, they need to learn automatically. In our case, that means a chatbot needs to add new Data Points to the Training Data in order to improve the Trained Model.

Let’s look at this in the context of an example from Part 1. There I showed you a riddle bot that asked me “What fruit can you spell by using only A, N, and B a number of times?”. I correctly solved it by guessing “Banana”.

If the bot counted all games played, it could save “solved-correctly” as a Data Point and start learning from it: For instance, the bot could calculate the “solved-correctly” ratio for each riddle and then optimize its questions by providing riddles that are neither too easy (=100% correct) nor too hard (=0% right). With the information gained from users like me, it would always provide the best riddles (e.g. 90% guessed right). This would mean, that the flow would change over time. And it would do so automatically, which means without human interaction!

The graphic below illustrates the above mentioned.

Each bot can automatically learn different things — with the algorithmic constraints that define the scope of flow change.

Just to name another example: Our virtual agents at Solvemate learn “solution popularity” from True Positives. The more popular a solution is, the more likely it is going be automatically proposed.

I believe that automatic learning is the supreme discipline of chatbots. It has the potential to meaningfully improve this exciting technology over time — but it’s also super tricky. Things can go wrong, as we’ve seen with the Twitter Bot Tay from Microsoft.

The Difficulty of Manual Learning

Let’s assume the bot did not figure out the true customer request. This means we’re either dealing with a False Positive, a False Negative, or a True Negative. Automatic learning is much, much harder in these cases.

Just imagine that you are in a completely dark room, trying to shoot a basketball through a hoop. Learning from True Positives means that you learn where the basket is after you’ve at least touched it — it gives you a lot of information. Not touching the basket (=True Negative) also gives you some information. It means the basket isn’t where you reached, but still doesn’t tell you where the true location is.

Returning to chatbots: Automatic learning from not solving a customer request is possible, but much much harder. Which is why these cases usually require human review to add a Data Point to the Training Data.

Let’s assume a user wants to check their order status but the bot didn’t have a solution. In this case, an AI Trainer should manually add “get order status” as Data Point to the Training Data and provide some wording around it. AI Trainers review conversations and tell the bot what it should have responded. It’s like teaching a child how to behave.

Solvemate’s virtual agents do this similarly: For all patterns where solutions were wrong, the AI trainer needs to decide whether they should…

add a solution (because of a True Negative)
change a solution that was wrongly suggested (because of a False Positive)
add knowledge to make the solution better (because of False Negative)
ignore the New Data Point (because the request wasn’t serious)

It boils down to the decision of:

Adding a (True Positive) Data Point
Not adding a Data Point (=send it to Data Graveyard)

The illustration below shows that.

Manually adding Data Points to the Training Data is a normal process in chatbots and happens quite frequently. Just keep in mind that a manual training effort can be a significant cost driver. I have written more about this in our Chatbot ROI Calculator.

Two Improvement Cycles

In summary, we have…

an automatic improvement cycle where the usage of the bot leads to New Data Points that change the Training Data that can change the Trained Model, which will potentially change its Responses. Quite fancy, isn’t it?
a manual improvement cycle where usage of the bot is reviewed by humans that decide if they want to add New Data Points to the Training Data, that will change the Trained Model and potentially change the Responses.

Combining the Insights

If you’re in conversation with a chatbot vendor and apply the chatbot taxonomy, you can now dive very deeply: Ask them specifically, how their bot’s flow is dynamic and how they train a Trained Model or Training Data.

Now you also have now more background to understand the change of the flow. Ask the vendor which Data Points are processed and how they affect an automatic improvement cycle — unless there are no automatic improvements.

One Last Thought

It is totally ok, not to have a dynamic flow or an automatic improvement cycle in chatbots. Not every use case needs the most fancy, dynamic, self-improving algorithms. Just understand that typically customer support too complex and dynamic to be automated with rule-based, static bots.