
In the previous post, we discussed attention-based seq2seq models and the logic behind their inception. The plan was to create a PyTorch implementation story about the same but turns out, PyTorch documentation provides an excellent procedure here. So here, I move onto the next item in my plan — the transformer — which works on the principle of Self Attention.
Let's do a two-line recap of the attention-based model. Its primary ideology was that it took an input sequence and all the hidden states associated with it and at every instance of the output, it decided which part of the…
Sequence-to-sequence (abrv. Seq2Seq) models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning. Google Translate started using such a model in production in late 2016. These models are explained in the two pioneering papers (Sutskever et al., 2014, Cho et al., 2014).
A Seq2Seq model is a model that takes a sequence of items (words, letters, time series, etc) and outputs another sequence of items.

In the case of Neural Machine Translation, the input is a series of words, and the output is the translated series of words.

Data Scientist, Walmart Labs