Testing a Chatbot with k-folds Cross Validation
This post is part of a series:
1) Testing a Chatbot with k-folds Cross Validation
2) Analyze chatbot classifier performance from logs
3) Improve a chatbot classifier with production data
In the embedded video I demonstrate doing a k-folds cross validation test with the WA-Testing-Tool.
K-folds cross validation helps you find confusion in your training data and intents. It does NOT predict runtime performance on utterances it has not previously seen, however it can help determine if your initial intent structure is clear or confusing, as well as identify places where additional training is needed. You should perform k-folds during bot development, after you have done enough user research to determine what intents they need from the chatbot but before you distribute the assistant to a broader audience.
In the video I demonstrate how to run a k-folds cross validation test and how to interpret the results. I identify areas for improvement which can include intent refinement (add/remove/merge) or areas where additional training data may be needed. I do not act on these suggestions in this video. In follow-on videos I show how to use production log data to analyze runtime performance and to improve the chatbot from runtime data.
(The training data I used came from the Customer Care content catalog.)
For help in implementing these practices, reach out to IBM Data and AI Expert Labs and Learning.