Seq2Seq is often focus on solve language translation problem, it’s based on RNN architecture.

The main process of Seq2Seq is input a sequence and output a sequence, it consist of Encoder and Decoder.


LSTM cell in encoder and decoder.

Train & Inference process

Notice that the inference decoder feeds the output of each time step as an input to the next.

The training decoder does not feed the output of each time step to the next. Rather, the inputs to the decoder time steps are the target sequence from the training dataset (the orange letters).

Train process & Inference process
Inference process
Train process

Special letter

There are four symbols, however, that we need our vocabulary to contain. Seq2seq vocabularies usually reserve the first four spots for these elements:

  • <PAD>: During training, we’ll need to feed our examples to the network in batches. The inputs in these batches all need to be the same width for the network to do its calculation. Our examples, however, are not of the same length. That’s why we’ll need to pad shorter inputs to bring them to the same width of the batch
  • <EOS>: This is another necessity of batching as well, but more on the decoder side. It allows us to tell the decoder where a sentence ends, and it allows the decoder to indicate the same thing in its outputs as well.
  • <UNK>: If you’re training your model on real data, you’ll find you can vastly improve the resource efficiency of your model by ignoring words that don’t show up often enough in your vocabulary to warrant consideration. We replace those with <UNK>.
  • <GO>: This is the input to the first time step of the decoder to let the decoder know when to start generating output.

Note: Other tags can be used to represent these functions. For example I’ve seen <s> and </s> used in place of <GO> and <EOS>. So make sure whatever you use is consistent through preprocessing, and model training/inference.



Show your support

Clapping shows how much you appreciated Long’s story.