You On AI Encyclopedia · AI Scaling Laws The You On AI Encyclopedia Home
Txt Low Med High
CONCEPT

AI Scaling Laws

The empirical power-law relationships — Kaplan (2020), Chinchilla (2022), and subsequent refinements — between model size, training data volume, and computational budget that now function as the AI industry's version of Moore's Law: trend lines acquiring the force of self-fulfilling prophecy.
The AI scaling laws are, in their structural character, exactly what Moore described in 1965: observations fitted to data, stated plainly, and acquiring economic force as an entire industry organizes itself around them. The Kaplan scaling laws, published by OpenAI researchers in 2020, established that language-model performance improves predictably with scale across model parameters, training data, and compute. The Chinchilla laws from DeepMind refined the relationship in 2022, showing that optimal performance requires scaling parameters and training data together rather than one at a time. The empirical observation that training compute required for a given capability has been halving every eight months — sometimes called 'Moore's Law squared' — drives hundreds of billions of dollars in infrastructure investment on the assumption that the next doubling will arrive on schedule.
AI Scaling Laws
AI Scaling Laws

In The You On AI Encyclopedia

Moore's framework illuminates what the scaling laws are and are not. Like Moore's original observation, they are trend lines — patterns in data sets, not equations derived from first principles. The Kaplan and Chinchilla relationships describe what has happened in training runs. They do not explain why, which means they cannot predict with confidence when the relationship will break. This distinguishes them from genuine physical laws and aligns them with Moore's original status: empirically grounded, economically consequential, and dependent on continued engineering effort to sustain.

The unit of measurement marks a fundamental shift from Moore's semiconductor framework. Moore measured transistors — physical objects that could be counted, photographed, and fabricated. AI scaling laws measure tokens: statistical artifacts manipulated through matrix multiplications. Tokens do not occupy space on a die. They have no independent existence outside the computational infrastructure that sustains them. The relationship between the unit and its infrastructure is not manufacturing but metabolism — continuous consumption of energy, hardware, and cooling for as long as the token is in use.

Moores Law
Moores Law

The scaling laws are encountering walls analogous to those Moore's Law faced. The data wall — the finite supply of high-quality training text — is approaching saturation, with current frontier models consuming a significant fraction of the estimated ten to twenty trillion tokens of high-quality English text available. The energy wall is already visible in the International Energy Agency's flagging of AI data centers as a growing fraction of global electricity demand. The economic wall — whether revenue scales fast enough to justify escalating training costs — is the one Moore's framework identifies as ultimately decisive.

The scaling laws inherit Moore's warning about one-dimensional measurement. In 2008, Moore observed that treating intelligence as 'a one-dimensional, quantifiable characteristic of humans or computers' was naïve. The benchmarks that measure AI capability — accuracy on tests, performance on coding challenges, scores on reasoning tasks — are one-dimensional measures of a phenomenon that resists one-dimensional characterization. The scaling laws capture the average relationship; the shadows operate at the margin, and it is the margin that determines when the wall arrives.

Origin

The Kaplan scaling laws emerged from a 2020 paper by Jared Kaplan and colleagues at OpenAI, establishing empirical power-law relationships between cross-entropy loss and model parameters, training dataset size, and compute across seven orders of magnitude. The Chinchilla refinement came in 2022 from DeepMind researchers led by Jordan Hoffmann, who demonstrated that Kaplan's recommendations had systematically undertrained large models and that optimal training requires scaling parameters and data in roughly equal proportion.

The observation that AI capability costs halve approximately every eight months — sometimes attributed to Naveen Rao as 'Mosaic's Law' and popularized by various industry analysts — emerged empirically from tracking the compute requirements of models achieving equivalent benchmark performance over time. Like Moore's Law, the observation is descriptive, not mechanistic, and its persistence depends on continued engineering effort.

Key Ideas

Transistors To Tokens
Transistors To Tokens

Trend lines, not physics. The scaling laws are curves fit to data, not equations derived from first principles, making them empirically robust but theoretically incomplete.

Tokens replace transistors. The unit of AI scaling is statistical, not physical — with profound consequences for infrastructure, economics, and the nature of the walls that will constrain growth.

Walls are coming. Data saturation, energy constraints, and economic sustainability each represent potential binding constraints, and the industry's rotation onto new dimensions when saturation arrives will shape the next decade.

One-dimensional measurement. Like all scaling laws, the AI version measures a single dimension of a multidimensional phenomenon — a structural limitation Moore identified as 'naïve' when applied to intelligence.

Data Wall
Data Wall

Self-fulfilling prophecy. Companies invest hundreds of billions on the assumption that the next doubling arrives on schedule; the investment itself helps ensure that it does.

Debates & Critiques

Whether the scaling laws will continue to hold as data saturates and energy constraints bind is the central empirical question in contemporary AI. Optimists argue that synthetic data generation, multimodal training, and algorithmic efficiency improvements will sustain the curve. Skeptics — including researchers who have examined the Chinchilla relationships in detail — argue that the returns on scale are already diminishing and that the next doubling will require qualitatively different approaches rather than more of the same. Moore's framework suggests that the rotation will happen but that it is not automatic, and the terms of the rotation will determine who benefits.

Further Reading

  1. Kaplan, McCandlish, et al., Scaling Laws for Neural Language Models (OpenAI, 2020)
  2. Hoffmann, Borgeaud, et al., Training Compute-Optimal Large Language Models (DeepMind/Chinchilla, 2022)
  3. Epoch AI, Compute Trends Across Three Eras of Machine Learning (2022)
  4. Stanford AI Index reports (annual, 2021–2026)

Three Positions on AI Scaling Laws

From Chapter 15 — how the Boulder, the Believer, and the Beaver each read this concept
Boulder · Refusal
Han's diagnosis
The Boulder sees in AI Scaling Laws evidence of the pathology — that refusal, not adaptation, is the correct posture. The garden, the analog life, the smartphone that is not bought.
Believer · Flow
Riding the current
The Believer sees AI Scaling Laws as the river's direction — lean in. Trust that the technium, as Kevin Kelly argues, wants what life wants. Resistance is fear, not wisdom.
Beaver · Stewardship
Building dams
The Beaver sees AI Scaling Laws as an opportunity for construction. Neither refuse nor surrender — build the institutional, attentional, and craft governors that shape the river around the things worth preserving.

Read Chapter 15 in the book →

Explore more
Browse the full You On AI Encyclopedia — over 8,500 entries
← Home 0%
CONCEPT Book →