Paper Review 6: Neural Machine Translation by Jointly Learning to Align and Translate
In this post, the paper “Neural Machine Translation by Jointly Learning to Align and Translate” is summarized.
Link to paper: https://arxiv.org/pdf/1409.0473.pdf
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, “Neural Machine Translation by Jointly Learning to Align and Translate,” in ICLR 2015
In this paper, a novel approach for machine translation is proposed, which is called attention model. In previous methods, an encoder-decoder model is introduced for machine translation. Existing studies were proposing to obtain a fixed vector representation for input source sentence and then to use this representation to give a target output. But, this way of sequence to sequence learning has some limitations. For example, in this way one tries to embed information from a sentence of length 5 and 50 into same dimensional vector. Clearly, this is a drawback.
Therefore, in this paper a method on obtaining different vector representations at each time step of sequential processing is proposed. Let’s consider machine translation task. Instead of obtaining only one vector representation at the end of encoder part, this study proposes to produce different representations using different combinations on inputs in the sequence. In other words, in machine translation task for each produced output at the decoder side words in the source side have different effect. The combination weights are adaptively learned via back-propagation. Above figure shows encoder-decoder method with attention model. The results shown in the paper suggest that this work provides serious performance improvements over previous studies in machine translation tasks. Hence, this model became state of the art model used in translation engines nowadays.