Chatbot Data

Thomas Packer, Ph.D.
TP on CAI
Published in
2 min readNov 7, 2019

This story is a rough-draft. Check back later for the fully-polished story.

Conversational AI, like the machine learning techniques it is often based on, is data-hungry. There are many kinds, sources, and uses of data in conversational artificial intelligence (CAI) and in chatbot development and use. Here are a few sources and ways to gather data.

Photo by Nikolas Noonan on Unsplash

Data Sources and Types

User Context

User metadata such as geolocation.

History

User input from past sessions.

Distributed

User input from sessions on other devices.

Human Agents

Human agent in the background during a conversation with a bot. Human annotators outside of conversations.

How to Gather Data

We need a way to gather data to support the bot’s intelligence and capabilities. Here are some ways to get data into a bot.

Data Engineering

Gather dialogs from production-use of the chatbot. Then a subject matter expert can annotate sentences with intent, entities, responses.

Information Extraction

The bot can get data that it extracts from conversations using NLP techniques. Simple case: the bot asks the user for answers to select questions. To ensure data quality, ask more than one user the same question.

We could get creative and get metadata too: the bot could ask the user if they know the name of a certain entity type for one of the bot’s slots or slot types, and the user provides that name, the bot updates its dictionaries/ontologies, possibly after human agent review or perhaps based on multiple users having provided the same value.

Machine Learning

Machine learning applied to the bot’s own conversations. Pick an outcome you want the chatbot to optimize, for example satisfied customer. Pick a (proxy) metric that measures that outcome, e.g. percentage of customers who reply “yes” when the bot asks if they are satisfied. Then pick features that the chatbot might be able to use to predict that outcome, e.g. sentiment scores of each human utterance. Using this data gathered over many conversations, you could train a model that predicts customer satisfaction without having to explicitly ask the user, assuming the model is accurate enough.

Consider reinforcement learning to streamline the bot’s decisions to reach a repeated goal.

Join the CAI Dialog on Slack at cai-dialog.slack.com

About TP on CAI

Other stories in TP on CAI you may like:

--

--

Thomas Packer, Ph.D.
TP on CAI

I do data science (QU, NLP, conversational AI). I write applicable-allegorical fiction. I draw pictures. I have a PhD in computer science and I love my family.