ArchitectureTraining

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Google·October 11, 2018

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

TL;DR

BERT pretrains a bidirectional Transformer with masked-language modeling, then fine-tunes it to set new state of the art across many NLP tasks.

The paper that made pretrain-then-fine-tune the default recipe for NLP and showed how much could be learned from unlabeled text.