RoBERTa: A Robustly Optimized BERT Pretraining Approach

July 26, 20191907.11692

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi + 5 more

cs.CL

TLDR

RoBERTa revisits BERT pretraining with optimized hyperparameters and more data, achieving state-of-the-art NLP performance and revealing that BERT was originally undertrained.

Key contributions

Conducted a thorough replication study of BERT pretraining to isolate effects of hyperparameters and data size.
Demonstrated that increasing training data and tuning hyperparameters significantly improves model performance beyond original BERT.
Achieved state-of-the-art results on multiple benchmarks (GLUE, RACE, SQuAD) surpassing all previously published models.

Why it matters

This paper matters because it challenges the notion that newer models inherently outperform BERT by showing that careful optimization and more extensive training can yield superior results, emphasizing the critical role of training design choices and enabling the community to build on a stronger, openly available baseline.

Original Abstract

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers