A Detailed Look At Positional Encoding

A Powerful Technique For Maintaining Data Order

Ryan Partridge
6 min readAug 25, 2023

This article is part of a series about the Transformer architecture. If you haven’t read the others, refer to the introductory article here.

Transformer-based models have been a hot topic for the past year, and it’s not hard to see why! They possess a remarkable talent for responding to practical questions with human-like charm. At the core of their brilliance lies Vector Embeddings. These bundles of numbers hold the key to capturing the essence of each unique item in a dataset. It’s like giving each data point its own personality!

Looking for more information on Vector Embeddings? Check out my blog post on the subject!

Embeddings excel at differentiating one item from another but lack one critical attribute: an inherent understanding of sequence order. Traditional models, such as Recurrent Neural Networks (RNNs), accept data sequentially, one sample at a time, automatically preserving data positions. They didn’t worry about this problem! However, it came at a cost: longer training durations. Transformers reinvented the wheel, replacing RNN recurrence with Multi-Headed Attention, enabling parallelization to accept multiple inputs at once. Training times were drastically reduced, but sequence order was eliminated. So, how do they understand the context of data? Using Positional Encoding!

Why It Matters

--

--

Ryan Partridge

Hi there! I'm Ryan, a Software Developer who shares insights on Python, React, and AI techniques through visual and practical examples