Moore's framework illuminates what the scaling laws are and are not. Like Moore's original observation, they are trend lines — patterns in data sets, not equations derived from first principles. The Kaplan and Chinchilla relationships describe what has happened in training runs. They do not explain why, which means they cannot predict with confidence when the relationship will break. This distinguishes them from genuine physical laws and aligns them with Moore's original status: empirically grounded, economically consequential, and dependent on continued engineering effort to sustain.
The unit of measurement marks a fundamental shift from Moore's semiconductor framework. Moore measured transistors — physical objects that could be counted, photographed, and fabricated. AI scaling laws measure tokens: statistical artifacts manipulated through matrix multiplications. Tokens do not occupy space on a die. They have no independent existence outside the computational infrastructure that sustains them. The relationship between the unit and its infrastructure is not manufacturing but metabolism — continuous consumption of energy, hardware, and cooling for as long as the token is in use.
The scaling laws are encountering walls analogous to those Moore's Law faced. The data wall — the finite supply of high-quality training text — is approaching saturation, with current frontier models consuming a significant fraction of the estimated ten to twenty trillion tokens of high-quality English text available. The energy wall is already visible in the International Energy Agency's flagging of AI data centers as a growing fraction of global electricity demand. The economic wall — whether revenue scales fast enough to justify escalating training costs — is the one Moore's framework identifies as ultimately decisive.
The scaling laws inherit Moore's warning about one-dimensional measurement. In 2008, Moore observed that treating intelligence as 'a one-dimensional, quantifiable characteristic of humans or computers' was naïve. The benchmarks that measure AI capability — accuracy on tests, performance on coding challenges, scores on reasoning tasks — are one-dimensional measures of a phenomenon that resists one-dimensional characterization. The scaling laws capture the average relationship; the shadows operate at the margin, and it is the margin that determines when the wall arrives.
The Kaplan scaling laws emerged from a 2020 paper by Jared Kaplan and colleagues at OpenAI, establishing empirical power-law relationships between cross-entropy loss and model parameters, training dataset size, and compute across seven orders of magnitude. The Chinchilla refinement came in 2022 from DeepMind researchers led by Jordan Hoffmann, who demonstrated that Kaplan's recommendations had systematically undertrained large models and that optimal training requires scaling parameters and data in roughly equal proportion.
The observation that AI capability costs halve approximately every eight months — sometimes attributed to Naveen Rao as 'Mosaic's Law' and popularized by various industry analysts — emerged empirically from tracking the compute requirements of models achieving equivalent benchmark performance over time. Like Moore's Law, the observation is descriptive, not mechanistic, and its persistence depends on continued engineering effort.
Trend lines, not physics. The scaling laws are curves fit to data, not equations derived from first principles, making them empirically robust but theoretically incomplete.
Tokens replace transistors. The unit of AI scaling is statistical, not physical — with profound consequences for infrastructure, economics, and the nature of the walls that will constrain growth.
Walls are coming. Data saturation, energy constraints, and economic sustainability each represent potential binding constraints, and the industry's rotation onto new dimensions when saturation arrives will shape the next decade.
One-dimensional measurement. Like all scaling laws, the AI version measures a single dimension of a multidimensional phenomenon — a structural limitation Moore identified as 'naïve' when applied to intelligence.
Self-fulfilling prophecy. Companies invest hundreds of billions on the assumption that the next doubling arrives on schedule; the investment itself helps ensure that it does.
Whether the scaling laws will continue to hold as data saturates and energy constraints bind is the central empirical question in contemporary AI. Optimists argue that synthetic data generation, multimodal training, and algorithmic efficiency improvements will sustain the curve. Skeptics — including researchers who have examined the Chinchilla relationships in detail — argue that the returns on scale are already diminishing and that the next doubling will require qualitatively different approaches rather than more of the same. Moore's framework suggests that the rotation will happen but that it is not automatic, and the terms of the rotation will determine who benefits.