Paper Review 6: Neural Machine Translation by Jointly Learning to Align and Translate

Fatih Cagatay Akyon
NLP Chatbot Survey
Published in
2 min readNov 15, 2018

In this post, the paper “Neural Machine Translation by Jointly Learning to Align and Translate” is summarized.

Link to paper: https://arxiv.org/pdf/1409.0473.pdf

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, “Neural Machine Translation by Jointly Learning to Align and Translate,” in ICLR 2015

Example neural machine translation structure. (taken from http://opennmt.net/)

In this paper, a novel approach for machine translation is proposed, which is called attention model. In previous methods, an encoder-decoder model is introduced for machine translation. Existing studies were proposing to obtain a fixed vector representation for input source sentence and then to use this representation to give a target output. But, this way of sequence to sequence learning has some limitations. For example, in this way one tries to embed information from a sentence of length 5 and 50 into same dimensional vector. Clearly, this is a drawback.

Attention Model

Therefore, in this paper a method on obtaining different vector representations at each time step of sequential processing is proposed. Let’s consider machine translation task. Instead of obtaining only one vector representation at the end of encoder part, this study proposes to produce different representations using different combinations on inputs in the sequence. In other words, in machine translation task for each produced output at the decoder side words in the source side have different effect. The combination weights are adaptively learned via back-propagation. Above figure shows encoder-decoder method with attention model. The results shown in the paper suggest that this work provides serious performance improvements over previous studies in machine translation tasks. Hence, this model became state of the art model used in translation engines nowadays.

--

--