Masked Language Modeling

Pretraining by predicting hidden (masked) tokens.

In masked language modeling, random tokens are hidden and the model learns to predict them from both left and right context. This bidirectional objective, introduced by BERT, produces strong representations for understanding tasks.

Related papers