Architecture
Attention Is All You Need
Google·June 12, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit
View on arXivTL;DR
Introduces the Transformer, an architecture based entirely on attention that dispenses with recurrence and convolutions, training faster and achieving state-of-the-art translation results.
Why it matters
The single most influential paper in modern AI. Every major language model — GPT, Claude, Gemini, Llama — descends from the Transformer architecture introduced here.