Chatbots approaches: Sequence-to-Sequence VS Reinforcement Learning

Maâli Mnasri
Opla
Published in
5 min readMar 29, 2019

We have been working recently on a paper that addresses the most recent approaches for chatbots building to draw an overview of the current state of the art of chatbots.

During this research, we categorized the most used techniques in both the industrial and academic sectors. The first comparison was between rule-based and data-driven approaches, and the second was between Machine learning and Information retrieval. We will discuss those approaches in a further post on medium. Here, we want to focus on the comparison of two particular machine learning approaches for chatbots which were widely leveraged in the scientific landscape: Sequence-to-sequence models and Reinforcement Learning.

Let’s understand Sequence-to-sequence learning

Sequence to sequence (seq2seq) learning (Sutskever et al., 2014) is a way to combine multiple Recurrent Neural Networks (RNN) in a particular architecture to tackle complex sequence-to-sequence prediction problems such as machine translation, image captioning, text summarization, question-answering, etc.

Seq2seq learning have shown great success when first applied on the phrase to phrase Machine Translation (MT)(Cho et al., 2014) and therefore, inspired researchers to apply it for other tasks. During the training process, a seq2seq model learns to map a given input to a given output. The length of the input and output sequences can be different and this is the strength of seq2seq models in comparison with other neural learning models. So during the test, the trained model is able to generate a new output given new input. The image below shows a simplified layout of a seq2seq model for MT.

source: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

For building chatbots, the problem was considered as translating the user utterance to the chatbot answer.

Technically, a seq2seq model is composed of an encoder and a decoder. The encoder is a neural network or a sequence of neural networks that read the input sequence and convert it into a hidden state called thought vector because it stores the meaning of the input sequence, considered as a thought. The decoder is then fed with this thought vector. During the learning stage, it learns to map the hidden state to the target output sequence. In the inference stage, the decoder returns the predicted output sequence with respect to the learning goal. The figure below shows an example of a sequence to sequence architecture for generating answers.

source : (Mnasri, 2019)

The studies conducted on seq2seq-based chatbots have shown that training a straightforward seq2seq model on a large conversational dataset is a simple way to create a chatbot that answers simple questions, extracts relevant information and even perform some shallow reasoning. Seq2seq models can also learn many aspects besides answer generation. For example, they can inherit the personality of characters they learn from, they can capture the tone of the speaker and answer in a convenient way.

Let’s move to Reinforcement Learning

Reinforcement learning isn’t as recent as seq2seq learning. It is an old learning technique used firstly to train robots to move. The process is inspired by the human learning process. To start learning, children tend to interact with the surrounding environment. After each interaction, they decide whether to repeat or not an action based on the consequences of that action. To train a dialogue system with reinforcement learning, the chatbot interacts with the end-users and observes the results of its actions. It receives each time a reward which can be positive or negative. Throughout the conversations, the chatbot becomes increasingly efficient.

Reinforcement learning cycle

Sequence-to-Sequence VS Reinforcement Learning: Who wins?

Both approaches have limits and strengths. Seq2seq made a huge step forward when it comes to the generative aspect. They are also simple to train as they are most of the time purely data-driven. Up to now, you may be impressed by these networks. The bad news is that they have many weaknesses also. The first problem to tackle with these models is the highly general answers they return. The second drawback is related to the grammatical correctness of the answers. While seq2seq images that we present everywhere show perfect examples of outputs, the reality is slightly different and the generated answers are not always well-formed. So more work needs to be done in order to correct the output. Some researchers also highlighted that seq2seq learning predicts answers only one at a time and fails to predict their influence on future utterances. The last problem is related to the necessity of having a big amount of data, a problem we are working on currently (see our previous article on data augmentation).

Meanwhile, Reinforcement learning leads to more natural chatbots as they learn from human feedback and develop their own control system. Reinforcement learning enables the chatbot to handle long conversations and take into account the preceding turns. It is also convenient to use RL to make chatbots learn to act and not only to chat. For example, a flight booking agent will receive a reward if it properly books a flight according to the user request and will get penalized if it makes an error. However, RL raises two main issues. Firstly, it needs so much time and so many interactions until the agent is trained which may be restricting if the training is online. Secondly, RL is not the most suitable option to learn language generation.

To end the suspense...

In my opinion, both are winners. None is perfect on its own and both approaches complement one another. So the question is What should I use?

In order to take the best part of each method, many studies focused on combining these two approaches. The combination will depend on the end-use. Li et al. (2016) suggested a novel model that leverages the seq2seq ability to represent semantics and combined it to a reinforcement learning policy that optimizes longterm rewards according to the final use. That being said, the right question should be: How to combine Sequence-to-Sequence learning to Reinforcement Learning?

If our state-of-the-art article helps with your work, please cite it as follows:

MNASRI, Maali. Recent advances in conversational NLP: Towards the standardization of Chatbot building. arXiv preprint arXiv:1903.09025, 2019.

LI, Jiwei, MONROE, Will, RITTER, Alan, et al. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541, 2016.

MNASRI, Maali. Recent advances in conversational NLP: Towards the standardization of Chatbot building. arXiv preprint arXiv:1903.09025, 2019.

SUTSKEVER, Ilya, VINYALS, Oriol, et LE, Quoc V. Sequence to sequence learning with neural networks. In : Advances in neural information processing systems. 2014. p. 3104–3112.

--

--

Maâli Mnasri
Opla
Editor for

Researcher @Opla in #AI #NLU #Machine Learning #Conversational NLP. Phd in NLU : Automatic Text Summarization