Analyze chatbot classifier performance from logs

Published in

IBM Data Science in Practice

2 min readDec 11, 2019

Explore log data to determine how well your chatbot performs. Photo by John Schnobrich on Unsplash

This post is part of a series:
1) Testing a Chatbot with k-folds Cross Validation
2) Analyze chatbot classifier performance from logs
3) Improve a chatbot classifier with production data

In the embedded video I demonstrate reviewing user utterances and how to turn them into “blind test” and future training data with the WA-Testing-Tool.

I first describe how to collect user utterances from logs and how to manually classify them into intents, taking care to make sure each example represents a single clear intent and removes superfluous text. For example “Hi, thanks for helping, can you tell me where the store is?” should be shortened to “Can you tell me where the store is?” and associated with the “Store Location” intent. I also describe techniques for group review including the notion that if a consensus can not be reached on an example in ten seconds, it is not an obvious example and should not be used for training.

The video culminates in the creation of a blind test set and a blind test evaluation through WA-Testing-Tool including an accuracy measurement. Areas for improvement are discussed but will be addressed in the next video.

10 minute video on reviewing log data and analyzing chat performance in a blind test.

For help in implementing these practices, reach out to IBM Data and AI Expert Labs and Learning.

Analyze chatbot classifier performance from logs

Written by Andrew R. Freed