Understanding Chatbots — Part 2

Sumanth Prabhu
Sep 3, 2018 · 5 min read

In the previous post, we spoke about Retrieval Based Chatbots. In this post, we will look at Generative chatbots.

The disadvantage of Retrieval Based bots are that they are limited to a repository of possible responses. Generation based models have the ability to generate dynamic responses conditioned on the current utterance from the user and the dialogue history. Typically, you would use a Seq2Seq model comprising of two components, an LSTM encoder and a LSTM decoder. The encoder encodes the input sequence (also called the source sequence) into a fixed dimensional vector, the output of the last hidden layer of the encoder. This vector is input to the decoder, which generates the output (also called the target sequence). The following figure explains how it would work

https://ai.googleblog.com/2015/11/computer-respond-to-this-email.html

Formally, we compute the conditional probability of an output sequence ‘y’ given an input sequence ‘x’ conditioned on the previous time stamp values for ‘y’ and the intermediate vector representation ‘v’ (equivalent to the ‘thought vector’ shown in the above figure) . The encoder encodes the source sequence to the hidden state representation ‘v’ which is then used by the decoder to generate the target sequence.

A potential issue with this encoder–decoder approach is that the model needs to be able to compress all the necessary information of a source sentence into a fixed-length vector which makes it difficult to deal with long sentences. To address this, researchers introduced a model that uses a “soft search” mechanism, to focus on a certain set of positions in the source sequence. These positions account for parts of the sequence that contain the most relevant information that can assist the generation process. The model then predicts a target word based on the context vectors associated with these positions in the source sequence and all the previous generated target words. There is a very interesting read on neural machine translation that explains usage of attention mechanism for sequence to sequence models in language transaltion. The following figure shows attention layers learn to focus on different parts of the input when generating the output -

Source https://github.com/tensorflow/nmt
Source https://github.com/tensorflow/nmt
Source https://github.com/tensorflow/nmt

For the fun of it, I trained a seq2seq model with attention on the
Cornell movie dialogue corpus. The dataset comprises of 220,579 conversational exchanges between 10,292 pairs of movie characters involving 9,035 characters from 617 movies. The following is a screenshot of what the conversation looked like -

The bot seems to have picked up a mix of characters and converses with fluent but absolutely incoherent responses.

Improvements

There has been a lot of research on improving on top of the simple encoder-decoder architecture discussed above. One approach is called Hierarchical Recurrent Encoder Decoder (HRED) which focuses on not just the current input query but also the entire dialogue history. Each output sequence is modelled with a two-level hierarchy - all sequences of utterance, and each utterance as sequences of words. To do this it uses two encoders — one encoder for the utterance level representation and one encoder for the session level representation. Each utterance is deterministically encoded into a vector which is passed to the session level encoder , which updates its internal hidden state to summarize the conversation up to that point in time. The output from the session level encoder is passed to the decoder to generate the response. You can find more details here

Source https://arxiv.org/pdf/1507.02221.pdf

Hierarchical Recurrent Attention Network (HRAN)

HRED focusses on modeling the hierarchy of the context, it was further improved by introducing attention mechanism. Hierarchical recurrent attention network (HRAN) for multi-turn response generation introduces attention layers for word sequences and the utterance sequence when generating a response. It has a word level utterance encoder with attention weights for each word. The vectors generated from this encoder is passed to an utterance level encoder which constructs hidden representations of the context. As shown in the figure, the word level attention mechanism in HRAN is dependent on both the decoder and the utterance level encoder. It also has an utterance level attention mechanism that identifies the important utterances in the utterance sequence. The decoder then takes in the context vector and generates the response. You can find more details here.

Source https://arxiv.org/pdf/1701.07149.pdf

Conclusion

We have seen a few possible approaches for generative chatbots. Though the responses are fluent, generative chatbots still have a tough time maintaining coherence in responses. In the next post we will see some hybrid systems that combine retrieval based chatbots with generative chatbots.

References

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. EMNLP.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. NIPS

Alessandro Sordonif , Yoshua Bengiof , Hossein Vahabig , Christina Liomah , Jakob G. Simonsenh , Jian-Yun Nie. A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion

Chen Xing , Wei Wu , Yu Wu , Ming Zhou , Yalou Huang , Wei-Ying Ma. Hierarchical Recurrent Attention Network for Response Generation

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Sumanth Prabhu

Written by

Senior Data Scientist/Engineer @ Paytm

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade