CONCEPT
Scaling Laws
The empirical relationships that predict how a
language model's loss decreases with training compute, parameters, and data — the most reliable quantitative instrument the AI field has, and the reason investors have been willing to fund ten-figure training runs.
Scaling laws are empirical power-law relationships, discovered by Hestness et al. (2017) and formalized for language models by Kaplan et al. (2020) and Hoffmann et al. (2022, the "Chinchilla" paper),
between a
transformer language model's training loss and the three inputs most under practitioner control: compute, parameters, and tokens. The relationships hold across five orders of magnitude and have been the most reliable forecasting instrument in the field for the past five years. They predict that doubling compute reduces loss by a known fraction, that the optimal parameter–data ratio scales predictably, and — most consequentially — that continued investment in scale will continue to produce capability gains until something structural breaks.
In The You On AI Field Guide
The Kaplan paper's headline finding was that language-model loss decreases as a power law in compute, parameters, and dataset size — and that the exponents of these power laws are stable across model sizes and architectures.