An approach to improve Seq2Seq modeling through self-attention and positional encoding

In the previous post, we discussed attention-based seq2seq models and the logic behind their inception. The plan was to create a PyTorch implementation story about the same but turns out, PyTorch documentation provides an excellent procedure here. So here, I move onto the next item in my plan — the transformer — which works on the principle of Self Attention.

Let's do a two-line recap of the attention-based model. Its primary ideology was that it took an input sequence and all the hidden states associated with it and at every instance of the output, it decided which part of the…


Sequence-to-sequence (abrv. Seq2Seq) models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning. Google Translate started using such a model in production in late 2016. These models are explained in the two pioneering papers (Sutskever et al., 2014, Cho et al., 2014).

What’s a Seq2Seq Model?

A Seq2Seq model is a model that takes a sequence of items (words, letters, time series, etc) and outputs another sequence of items.

Seq2Seq Model

In the case of Neural Machine Translation, the input is a series of words, and the output is the translated series of words.

Pranay Dugar

Data Scientist, Walmart Labs

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store