A Paper A Day: #1 Convolutional Sequence to Sequence Learning
1 min readMay 24, 2017
Starting today, I’ll be posting a short summary for a research paper every day — hopefully! I hope this will be useful for people interested in machine learning, reinforcement learning, and natural language processing, but I wanted to do this for other reasons as well. Basically I think this will be a motivation for me to read more, learn more, as well as improve my writing and analytical skills.
I’ll start by discussing the recent paper by the Facebook AI research (FAIR) team for convolutional sequence to sequence learning. Here is the main take-aways I got by reading this paper:
Main Results
- Designing strong fully convolutional neural networks for machine translation is possible.
- Compared to recurrent networks, convolutional neural networks are highly parallelizable, thus, the machine translation system can be much faster. Authors report a small bump in accuracy with a 9x faster performance.
- Multi-hop attention: instead of looking at a sentence once, the network takes several glimpses to produce better translation.
Why CNNs rather than RNNs?
Speed and scalability!
Experiments
- State-of-the-art results on WMT’16 English-Romanian translation, outperforming the previous best result by 1.8 BLEU.
- On WMT’14 English-German, results outperform the strong LSTM setup of Wu et al. (2016) by 0.5 BLEU and on WMT’14 English-French we outperform the likelihood trained system of Wu et al. (2016) by 1.5 BLEU.
- Furthermore, authors report a speedup of up to 9x in translation speed.
The code
Source code is available on Github.