Architecture

Attention Is All You Need

Google·June 12, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit

TL;DR

Introduces the Transformer, an architecture based entirely on attention that dispenses with recurrence and convolutions, training faster and achieving state-of-the-art translation results.

Why it matters

The single most influential paper in modern AI. Every major language model — GPT, Claude, Gemini, Llama — descends from the Transformer architecture introduced here.

Related terms

Transformer
Attention