Paper Review 4: Sequence to Sequence Learning with Neural Networks

Fatih Cagatay Akyon
NLP Chatbot Survey
Published in
2 min readNov 2, 2018

In this post, the paper “Sequence to Sequence Learning with Neural Networks” is summarized.

Link to paper: https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

Ilya Sutskever, Oriol Vinyals, Quoc V. Le, 2014, "Sequence to Sequence Learning with Neural Networks," pp. 3104–311 in NIPS 2014

In this paper, the authors present a model for sequence to sequence learning using neural networks, specifically long short term memory networks (LSTM) which is a type of recurrent neural network. This paper is important in terms of being Pioneer introducing this model which is widely used nowadays with small modifications for sequence to sequence learning tasks such as machine translation and chatbots. These two tasks are good examples of sequence to sequence learning since in both of them; an output sequence is produced for a given input sentence by trained network.

The model introduced in this paper is called encoder-decoder model. The main idea of this model is to first process input sentence by the encoder part of the network and to obtain a fixed vector representation. After obtaining this vector, decoder part of the network starts its processing with this fixed vector representation and at each time step of the decoder network it produces outputs. Operation stops when a special token showing the end of the sentence is produced. Distributed representations of the words are used as inputs to the encoder network. At the output side at each time step, a Word is chosen from a specific vocabulary list. At the decoder side, after decoder part a classifier network is used to produce a Word from the output at each time step. LSTM networks are used in this paper due to their capability of capturing long term relationships. Paper also shows experiments on machine translation task, which is from English to French. Simulation results reveal that presented approach outperforms existing studies.

--

--