“Sequence to Sequence Learning with Neural Networks” (2014) | one minute summary

Can you [encode-]decode this paper?

Jeffrey Boschman
One Minute Machine Learning
1 min readMay 1, 2021

--

This paper by Sutskever, Vinyals, and Le (Google) was one of the first to introduce the encoder-decoder architecture, which has subsequently been instrumental in the field of natural language processing (NLP).

Prerequisite knowledge: long short-term memory (LSTM)

  1. Why: Typical deep neural networks used on sequence data (i.e. Recurrent neural networks) require that the input and output sequences are the same length, but this is not always useful for tasks like language translation
  2. What: This paper proposed a two LSTM network for general sequence-to-sequence mappings: mainly using the example of taking in a sentence in English (a sequence of words) and outputting the translated sentence in French (which could be a different length than the input)
  3. How: The first LSTM (a.k.a. the encoder) reads the input sequence of tokens (i.e. words of a sentence) one-by-one and then produces a fixed-dimension context vector (i.e. it summarizes the information in an vector of a set size), then the second LSTM (a.k.a. the decoder) takes the context vector and creates an output sequence one token at a time

--

--

Jeffrey Boschman
One Minute Machine Learning

An endlessly curious grad student trying to build and share knowledge.