Scaling Laws for Neural Language Models

January 23, 20202001.08361

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess + 5 more

cs.LGstat.ML

TLDR

This paper identifies power-law scaling relationships between language model performance and factors like model size, dataset size, and compute, enabling optimal training strategies under fixed compute budgets.

Key contributions

Demonstrates that cross-entropy loss scales as a power-law with model size, dataset size, and compute.
Finds that architectural details like width and depth have minimal impact within broad ranges.
Provides equations to predict overfitting and training speed, guiding optimal compute allocation.

Why it matters

Understanding how language model performance scales with size, data, and compute is crucial for efficiently designing and training large models. This paper's empirical scaling laws allow practitioners to optimize resource use by training larger models on moderate data and stopping early, which can significantly reduce costs and improve sample efficiency in real-world applications.

Original Abstract

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on model/dataset size and the dependence of training speed on model size. These relationships allow us to determine the optimal allocation of a fixed compute budget. Larger models are significantly more sample-efficient, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers