Positional Embeddings

Published in

NLP-trend-and-review-en

2 min readNov 13, 2019

Transformer has already become one of the most common model in deep learning, which was first introduced in “Attention Is All You Need”. Before that, the most common model for sequence modelling was RNN. However, in Transformer, input sequence is not fed into the model one by one, while the whole sequence is directly fed into the model and “Attention Mechanism” will take care of which part we need to focus on.

In this case, the information of “order” is not shown in the embeddings. That is, in RNN, we fit the data one by one into RNN, therefore, the information of “order” is assumed to be automatically considered by the model, and in Transformer, this information is missed.

Poistional Embeddings is introduced for recovering position information. In paper, two versions of postional embeddings are mentioned, learned positional embeddings and sinusoidal positional embeddings respectively, and both are said to produce similar results.

To take a look at a case,

> I do not like the story of the movie, but I do like the cast
> I do like the story of the movie, but I do not like the cast

Although the words used are absolutely the same, the meanings are opposite, information of order is required to distinguish different meanings.

Sinusoidal positional embeddings generates a embeddings using sin and cos functions. By using the equation shown above, the author hypothesized it would allow the model to learn the relative positions.

We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, P Epos+k can be represented as a linear function of PEpos

Reference:
answer from Esmailian in stackExchange
sample code for transformer from Pytorch
Attention Is All You Need

Positional Embeddings

Written by CHEN TSU PEI