Training
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Meta·July 26, 2019
Yinhan Liu, Myle Ott, Naman Goyal
View on arXivTL;DR
Shows BERT was undertrained: more data and tuning yield substantially better results.
Why it matters
A lasting lesson that training recipe often matters more than architectural novelty.