The Bureau of Labor Statistics employs several hundred people whose job is to decide whether a new car is better than last year's car, and if so, by how much. The question sounds simple. The answer is among the hardest problems in applied economics. When a manufacturer adds a backup camera, improves fuel efficiency, and raises the sticker price, the statistician must decompose the price increase into genuine inflation and quality improvement. Get the decomposition wrong in one direction, inflation is overstated. Wrong in the other, real output growth is understated. The entire edifice of real GDP rests on these quality adjustments, performed through hedonic pricing models. For cars the methodology works tolerably. For software it strains. For AI-augmented cognitive output it breaks entirely.
Coyle has identified quality adjustment as one of the chronic weaknesses of the national accounting system — a weakness that compounds silently because the errors are invisible in the headline figures. The errors do not announce themselves. They accumulate in price deflators that convert nominal GDP into real GDP, and they propagate through every subsequent calculation that uses real GDP as input.
Consider the example running throughout The Orange Pill: an engineer who previously shipped one feature per sprint now ships ten. The productivity statistic registers a tenfold improvement. But what has happened to the quality of each feature? The honest answer is: nobody knows, and the measurement system has no way to find out. The features may be individually excellent — AI handling the mechanical implementation so effectively that the engineer devotes full attention to design, architecture, and user experience. Or the features may be individually adequate but shallow — competent implementations that pass functional testing but lack the architectural depth, edge-case resilience, and design sensitivity that would have been present if a human being had struggled through each implementation manually, building understanding through friction. Both scenarios produce the same productivity number.
When AI makes it cheap to produce competent work across a wide range of domains, the average quality of output in the economy may decline even as total quantity increases. The aggregate statistics will show growth. The lived experience will be of a world saturated with adequate output and starved of excellent output. The measurement system that tracks only quantity will celebrate the saturation. The quality dimension — the distinction between adequate and excellent that Segal frames as the aesthetics of the smooth versus depth — will be invisible.
Coyle's framework suggests that measurement of AI's economic value should be embedded within an endogenous growth framework — a model in which the quality of knowledge inputs, not merely their quantity, determines the trajectory of output over time. Quality-adjusted output measurement for cognitive work is a research programme that will take years to mature. But the alternative is no quality measurement at all, and the policy consequences are already visible: a system that rewards quantity over quality, that celebrates volume over depth.
The hedonic pricing methodology was developed by Andrew Court in 1939 and systematized by Zvi Griliches in the 1960s. Coyle's engagement with quality adjustment problems in the digital economy runs through her work on intangible capital with Jonathan Haskel and Stian Westlake, and her Stanford Digital Economy Lab white paper on measuring AI (2024).
Hedonic limits. The methodology that works for cars fails for products whose quality dimensions are not easily priced.
Quantity dominance. Current metrics count features shipped without assessing whether each reflects equivalent architectural depth.
Saturation without excellence. AI enables proliferation of adequate output, potentially compressing the market for depth.
Endogenous growth framework. Measuring AI requires treating knowledge quality, not quantity, as the input that matters for long-term trajectory.