TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Self-improving Chatbots based on Deep Reinforcement Learning

Debmalya Biswas
TDS Archive
Published in
10 min readSep 14, 2020

--

Elena Ricciardelli, Debmalya Biswas

Abstract. We present a Reinforcement Learning (RL) model for self-improving chatbots, specifically targeting FAQ-type chatbots. The model is not aimed at building a dialog system from scratch, but to leverage data from user conversations to improve chatbot performance. At the core of our approach is a score model, which is trained to score chatbot utterance-response tuples based on user feedback. The scores predicted by this model are used as rewards for the RL agent. Policy learning takes place offline, thanks to an user simulator which is fed with utterances from the FAQ-database. Policy learning is implemented using a Deep Q-Network (DQN) agent with epsilon-greedy exploration, which is tailored to effectively include fallback answers for out-of-scope questions. The potential of our approach is shown on a small case extracted from an enterprise chatbot. It shows an increase in performance from an initial 50% success rate to 75% in 20–30 training epochs.

The published version of the paper is available below, in proceedings of the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), Montreal, 2019 (Paper) (Code)

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Debmalya Biswas
Debmalya Biswas

Written by Debmalya Biswas

AI/ML, Privacy and Open Source | x-Nokia, SAP, Oracle | 50+ Patents https://www.linkedin.com/in/debmalya-biswas-3975261/