Improve a Chatbot Classifier with Production Data

Andrew R. Freed
IBM Data Science in Practice
2 min readDec 12, 2019
Improve your chatbot by selecting the right data to train on. Photo by Helloquence on Unsplash

This post is part of a series:
1) Testing a Chatbot with k-folds Cross Validation
2) Analyze chatbot classifier performance from logs
3) Improve a chatbot classifier with production data

In the embedded video I demonstrate reviewing a blind test result and how to improve a chatbot based on that result.

My initial bot was trained on 160 utterances (video 1) with a blind test against 200 utterances (video 2). In this video I split the 200 initially “blind” utterances in half, I analyze “part 1” and hold “part 2” out for a second blind test. I identify 45 examples of the “part 1” set that I add to the training data and do not look at “part 2” at all. I run a new blind test with the updated training (now using 160+45=205 utterances) and do a blind test on the “part 2” utterances with the WA-Testing-Tool. The updated training improves the blind test performance. I analyze where performance increased, where more work is needed, and discuss the need to implement this cycle iteratively. Regular analysis of production and blind data is critical to the ongoing health of a chatbot.

10 minute video on using test results to improve a chatbot and proving performance increase through a new blind test.

The processes in these posts help you get the most out of your virtual assistant. These steps are the key to having an accurate bot.

  1. Before you release new intents into production, you can test these intents for potential confusion using k-folds cross validation. The k-folds test can show you potential training problems that might cause problems in production accuracy. Review these before putting your bot into production.
  2. After you have released new intents into production, take time to analyze how accurate the assistant is in identifying them. Gather utterances from production users, find the real intents for each utterance, and use a blind test to assess your bot’s actual production accuracy.
  3. Based on the blind test results, improve the training data for one or more intents, then run another test to make sure you’ve actually improved the bot’s accuracy. If you have, promote the new training data to production. If you have not, just try again!

The process above is an iterative cycle. Use it any time you are introducing new intents to your bot!

For help in implementing these practices, reach out to IBM Data and AI Expert Labs and Learning.

--

--

Andrew R. Freed
IBM Data Science in Practice

Technical lead in IBM Watson. Author: Conversational AI (manning.com, 2021). All views are only my own.