A Paper A Day: #1 Convolutional Sequence to Sequence Learning

1 min readMay 24, 2017

Starting today, I’ll be posting a short summary for a research paper every day — hopefully! I hope this will be useful for people interested in machine learning, reinforcement learning, and natural language processing, but I wanted to do this for other reasons as well. Basically I think this will be a motivation for me to read more, learn more, as well as improve my writing and analytical skills.

I’ll start by discussing the recent paper by the Facebook AI research (FAIR) team for convolutional sequence to sequence learning. Here is the main take-aways I got by reading this paper:

Main Results

Designing strong fully convolutional neural networks for machine translation is possible.
Compared to recurrent networks, convolutional neural networks are highly parallelizable, thus, the machine translation system can be much faster. Authors report a small bump in accuracy with a 9x faster performance.
Multi-hop attention: instead of looking at a sentence once, the network takes several glimpses to produce better translation.

Why CNNs rather than RNNs?

Speed and scalability!

Experiments

State-of-the-art results on WMT’16 English-Romanian translation, outperforming the previous best result by 1.8 BLEU.
On WMT’14 English-German, results outperform the strong LSTM setup of Wu et al. (2016) by 0.5 BLEU and on WMT’14 English-French we outperform the likelihood trained system of Wu et al. (2016) by 1.5 BLEU.
Furthermore, authors report a speedup of up to 9x in translation speed.

The code

Source code is available on Github.

A Paper A Day: #1 Convolutional Sequence to Sequence Learning

Main Results

Why CNNs rather than RNNs?

Experiments

The code

Written by Amr Sharaf