Positional Encoding

How Transformers represent the order of tokens.

Because attention has no inherent notion of sequence, Transformers add positional information to each token so the model knows word order. Schemes range from the original sinusoidal encodings to learned and rotary variants.

Related papers