Positional Embeddings

CHEN TSU PEI
NLP-trend-and-review-en
2 min readNov 13, 2019

Transformer has already become one of the most common model in deep learning, which was first introduced in “Attention Is All You Need”. Before that, the most common model for sequence modelling was RNN. However, in Transformer, input sequence is not fed into the model one by one, while the whole sequence is directly fed into the model and “Attention Mechanism” will take care of which part we need to focus on.

In this case, the information of “order” is not shown in the embeddings. That is, in RNN, we fit the data one by one into RNN, therefore, the information of “order” is assumed to be automatically considered by the model, and in Transformer, this information is missed.

Poistional Embeddings is introduced for recovering position information. In paper, two versions of postional embeddings are mentioned, learned positional embeddings and sinusoidal positional embeddings respectively, and both are said to produce similar results.

To take a look at a case,

> I do not like the story of the movie, but I do like the cast
> I do like the story of the movie, but I do not like the cast

Although the words used are absolutely the same, the meanings are opposite, information of order is required to distinguish different meanings.

From: Attention Is All You Need

Sinusoidal positional embeddings generates a embeddings using sin and cos functions. By using the equation shown above, the author hypothesized it would allow the model to learn the relative positions.

We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, P Epos+k can be represented as a linear function of PEpos

--

--

CHEN TSU PEI
NLP-trend-and-review-en

這邊停止更新了!麻煩移駕到https://tsupei.github.io,有持續更新更多NLP的文章唷!