“Sequence to Sequence Learning with Neural Networks” (2014) | one minute summary
Can you [encode-]decode this paper?
Published in
1 min readMay 1, 2021
This paper by Sutskever, Vinyals, and Le (Google) was one of the first to introduce the encoder-decoder architecture, which has subsequently been instrumental in the field of natural language processing (NLP).
Prerequisite knowledge: long short-term memory (LSTM)
- Why: Typical deep neural networks used on sequence data (i.e. Recurrent neural networks) require that the input and output sequences are the same length, but this is not always useful for tasks like language translation
- What: This paper proposed a two LSTM network for general sequence-to-sequence mappings: mainly using the example of taking in a sentence in English (a sequence of words) and outputting the translated sentence in French (which could be a different length than the input)
- How: The first LSTM (a.k.a. the encoder) reads the input sequence of tokens (i.e. words of a sentence) one-by-one and then produces a fixed-dimension context vector (i.e. it summarizes the information in an vector of a set size), then the second LSTM (a.k.a. the decoder) takes the context vector and creates an output sequence one token at a time