Data Contamination
When benchmark or test data leaks into a model’s training set, inflating its scores.
If evaluation questions appear in pretraining data, benchmark results overstate true ability. Detecting and avoiding contamination — via canaries, decontamination filters, and fresh/held-out test sets — is a major concern when interpreting leaderboard numbers.