Member-only story
Self-improving Chatbots based on Deep Reinforcement Learning
Reinforcement Learning (RL) model for self-improving chatbots, specifically targeting FAQ-type chatbots
Elena Ricciardelli, Debmalya Biswas
Abstract. We present a Reinforcement Learning (RL) model for self-improving chatbots, specifically targeting FAQ-type chatbots. The model is not aimed at building a dialog system from scratch, but to leverage data from user conversations to improve chatbot performance. At the core of our approach is a score model, which is trained to score chatbot utterance-response tuples based on user feedback. The scores predicted by this model are used as rewards for the RL agent. Policy learning takes place offline, thanks to an user simulator which is fed with utterances from the FAQ-database. Policy learning is implemented using a Deep Q-Network (DQN) agent with epsilon-greedy exploration, which is tailored to effectively include fallback answers for out-of-scope questions. The potential of our approach is shown on a small case extracted from an enterprise chatbot. It shows an increase in performance from an initial 50% success rate to 75% in 20–30 training epochs.
The published version of the paper is available below, in proceedings of the 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), Montreal, 2019 (Paper) (Code)