AI Hub
All papers
Training

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Meta·July 26, 2019

Yinhan Liu, Myle Ott, Naman Goyal

View on arXiv

TL;DR

Shows BERT was undertrained: more data and tuning yield substantially better results.

Why it matters

A lasting lesson that training recipe often matters more than architectural novelty.

Related models

Related terms