Diane Coyle — On AI
Contents
Cover Foreword About Chapter 1: What the Dashboard Measures Chapter 2: The Productivity Paradox Returns Chapter 3: The Invisible Surplus Chapter 4: Household Production and the Missing Account Chapter 5: The Quality Adjustment Problem Chapter 6: When Free Is Not Free Chapter 7: The Wellbeing Gap Chapter 8: Counting What Matters Chapter 9: New Metrics for New Work Chapter 10: Beyond GDP in the Age of AI Epilogue Back Cover
Diane Coyle Cover

Diane Coyle

On AI
A Simulation of Thought by Opus 4.6 · Part of the Orange Pill Cycle
A Note to the Reader: This text was not written or endorsed by Diane Coyle. It is an attempt by Opus 4.6 to simulate Diane Coyle's pattern of thought in order to reflect on the transformation that AI represents for human creativity, work, and meaning.

Foreword

By Edo Segal

The number I trust least is the one I built my career on.

Productivity. Output divided by input. The metric that justified every hire, every sprint, every late night, every decision I made in Trivandrum when twenty engineers crossed a threshold that rewired my understanding of what a small team could do. Twenty-fold multiplier. I said it on stages. I said it to investors. I believed it completely.

Then I spent months inside the work of Diane Coyle, and I realized I had no idea what I was actually measuring.

Coyle is an economist, but not the kind who confirms what builders want to hear. She is the kind who asks what the confirmation is made of. Her life's work has been pulling apart the measurement systems that governments and companies treat as bedrock — GDP, productivity statistics, national accounts — and showing that the bedrock is an artifact. Something built by specific people in a specific emergency, carrying the assumptions of that emergency forward into a world those people never imagined.

GDP was invented to win a war. It counts what factories produce. It does not count what households sustain, what attention costs, what quality means, or whether the people behind the numbers are flourishing or quietly depleting themselves. And every policy conversation on earth still treats it as the definitive verdict on whether things are getting better.

This matters right now because the AI transition is producing numbers that look spectacular on every dashboard I know how to read. Output is surging. Features ship in days instead of months. The imagination-to-artifact ratio is collapsing toward zero. The dashboard is green.

But Coyle taught me to ask what the dashboard cannot show. The cognitive intensity that masquerades as efficiency. The household production evaporating behind the screen. The quality dimension that no metric currently tracks. The full cost of the transition that the hundred-dollar subscription price conceals.

In *The Orange Pill*, I argue that AI is an amplifier. Coyle's work forced me to confront an uncomfortable corollary: the metrics we use to evaluate the amplification are themselves amplifying a partial truth. They show the gains in high resolution and the costs in no resolution at all. A policy apparatus built on those metrics will celebrate what it can see and neglect what it cannot. And the things it cannot see — the sustainability of the working pattern, the wellbeing of the workers, the displacement happening inside households — may be where the real story lives.

What you measure shapes what you value. What you cannot measure disappears from the conversation. Coyle made me see the dashboard differently. She might do the same for you.

Edo Segal ^ Opus 4.6

About Diane Coyle

b. 1961

Diane Coyle (b. 1961) is a British economist, author, and public policy adviser whose work has reshaped how governments and institutions think about economic measurement in the digital age. Born in Bury, Lancashire, she studied at Harvard and University College London before pursuing a career that has spanned journalism, academia, and policy advisory roles. She served on the BBC Trust, the Migration Advisory Committee, and the UK Competition Commission, and co-chaired the independent Bean Review of UK economic statistics. She holds the Bennett Chair of Public Policy at the University of Cambridge, where she co-directs the Bennett Institute for Public Policy. Her books include *The Soulful Science* (2007), *The Economics of Enough* (2011), *GDP: A Brief but Affectionate History* (2014), *Markets, State, and People* (2020), and *Cogs and Monsters: What Economics Is, and What It Should Be* (2021). Her central intellectual contribution has been demonstrating that national accounting systems — particularly GDP — carry inherited assumptions from the industrial era that systematically misrepresent the digital and knowledge economies, rendering critical dimensions of economic activity invisible to policymakers. Her ongoing work on measuring AI's economic impact, the economics of data, and the institutional infrastructure required for effective governance of technological transitions has made her one of the most influential voices arguing that measurement reform is not a technical matter but a prerequisite for democratic governance.

Chapter 1: What the Dashboard Measures

Every nation on earth evaluates its economic health by consulting a number that was invented to win a war.

Gross domestic product — the sum of all final goods and services produced within a country's borders in a given period — was not designed to measure prosperity. It was not designed to measure happiness, capability, social cohesion, or the quality of a civilization's thinking. It was designed to tell Franklin Roosevelt how much military output the American economy could sustain without civilian consumption collapsing below subsistence. Simon Kuznets delivered the first national income accounts to the U.S. Senate in 1934, and by 1944, the Bretton Woods conference had adopted GDP as the international standard for comparing economies. The metric that emerged from the specific emergency of industrial mobilization became, within a decade, the universal standard by which nations judged their success.

Kuznets himself understood the distortion. He warned Congress, explicitly, that the welfare of a nation could scarcely be inferred from a measurement of national income. He distinguished between the quantity of growth and its quality. He understood that a dollar spent cleaning up a chemical spill and a dollar spent educating a child appeared identical in the national accounts, even though one represented repair of damage and the other represented investment in capability. The warning was heard, noted, and ignored. GDP became the dashboard. And the dashboard, once installed, acquired an authority that no subsequent argument could dislodge — because measurement systems are not merely technical instruments. They are institutional infrastructure. They are embedded in the quarterly reporting cycles of governments, the calibration of central bank models, the conditionality of international lending, and the career incentives of every economist who has ever written a policy brief. Changing what a society measures requires changing the institutions that collect, process, and act upon the measurements. That is a project measured in decades, not papers.

Diane Coyle has spent those decades building the intellectual case for reform. Her work — from GDP: A Brief but Affectionate History through Cogs and Monsters to The Measure of Progress — constitutes the most sustained, technically rigorous, and institutionally grounded critique of GDP measurement currently available in the English language. The critique is not that GDP is useless. Coyle has repeatedly insisted that GDP does what it does remarkably well: it measures market production. The critique is that GDP has been asked to do something it was never designed to do — to serve as a proxy for national welfare — and that the gap between what it measures and what it is used to evaluate has consequences. What the metric cannot see, the policy conversation cannot discuss. What the policy conversation cannot discuss, the political system cannot address.

The AI transition blows this gap wide open.

Consider the scene that Edo Segal describes in The Orange Pill: twenty engineers in Trivandrum, each achieving a twenty-fold productivity multiplier using Claude Code at one hundred dollars per person per month. The productivity metric captures this with enthusiasm. Output per worker-hour surges. If these engineers are producing software sold to customers, the revenue appears in GDP. The efficiency improvement appears in the productivity statistics. The dashboard lights up.

But what has the dashboard actually registered? It has registered the market value of the output. It has not registered the composition of the input — whether the productivity came from a more efficient process or from a more intense deployment of human cognitive capacity. It has not registered the meals skipped, the relationships strained, the attention reserves depleted. It has not registered the domestic production that evaporated when the engineers' absorption in the tool displaced the household labor they would otherwise have performed. It has not registered the quality dimension of the output — whether the twenty features shipped in the time previously required for one represent twenty genuine improvements or twenty competent-but-shallow artifacts that no metric can distinguish from excellence.

The dashboard shows a number going up. The number is accurate. The inference drawn from the number — that things are getting better — is the part that requires scrutiny.

Coyle's framework identifies three systematic blind spots in the GDP measurement system, each of which the AI transition transforms from a chronic limitation into an acute crisis.

The first is the boundary between market and non-market production. GDP counts goods and services transacted through markets. When a family hires a cleaner, the cleaner's wages contribute to GDP. When a parent cleans the same house, the labor is invisible. The work is identical. The value is equivalent. The metric sees one and not the other. This omission has been the subject of feminist economic critique since Marilyn Waring's If Women Counted documented how national accounting systems systematically rendered women's unpaid labor invisible. Time-use surveys across OECD countries consistently show that household production — cooking, cleaning, childcare, eldercare, emotional maintenance — represents between twenty and forty percent of total economic activity, depending on methodology and boundary definitions. None of it appears in GDP.

The AI transition supercharges this omission. When Segal describes a marketing manager building a custom tracking tool in an evening, she is performing an act of production that will never appear in any national account. She did not buy the tool. She built it for herself. The value is real, potentially enormous in aggregate across millions of users building personal tools, and completely invisible to the measurement system. The language interface has turned every knowledge worker into a potential producer of software, and the production that results from this transformation — personal tools, custom workflows, AI-assisted analyses built for individual use — constitutes a new category of household production that the national accounts have no mechanism to capture.

The second blind spot is the distinction between quantity and quality. GDP measures the market value of output, which should in principle reflect quality through the mechanism of price — consumers pay more for better products. In practice, distinguishing genuine quality improvement from mere quantity increase is among the hardest problems in economic measurement. Statistical offices employ hedonic pricing methods to adjust for quality changes in products like computers, where processing speed doubles every few years and the statistician must determine how much of a price decline represents genuine cheapening and how much represents quality improvement at constant price. These adjustments are imperfect for computers. They are entirely absent for the kinds of output the AI transition produces in abundance.

When AI enables a developer to ship ten features in the time previously required for one, the productivity statistic registers a tenfold improvement. But productivity statistics have no mechanism for assessing whether each of those ten features reflects the same depth of architectural thinking, the same quality of user experience design, the same durability under edge cases, as the single feature that would have been produced without AI. The output quantity is measurable. The output quality is not — at least not by any metric that currently feeds into the national accounts. A policy apparatus that sees only the quantity will celebrate the tenfold improvement. A policy apparatus that could also see the quality might ask harder questions about what the improvement actually contains.

The third blind spot is the gap between economic output and human wellbeing. The evidence that GDP growth does not reliably produce proportional increases in life satisfaction, health, or happiness above a certain income threshold is now substantial enough that even GDP's defenders concede the point. The Easterlin paradox — richer people within a country are happier than poorer people, but countries that grow richer over time do not reliably become happier — has survived fifty years of empirical challenge and counter-challenge. The specific mechanisms remain debated. The broad conclusion does not: GDP measures production, not flourishing, and the two are not the same.

The AI transition threatens to drive the largest wedge yet between production and flourishing. Segal describes the experience of working with Claude Code as simultaneously exhilarating and distressing — sometimes in the same minute. The dashboard that shows his productivity would register nothing but improvement. The human being behind the dashboard is reporting a compound experience that the metric cannot decompose. The productive hours are also depleting hours. The creative breakthroughs are also episodes of compulsive engagement that resist the off switch. The output is extraordinary. Whether the person producing it is flourishing is a separate question that the output metric is constitutionally incapable of answering.

Coyle has argued, with the patience of someone who has been making the same point for twenty years to audiences that nod in agreement and then return to their GDP-calibrated models, that the solution is not to abolish GDP but to supplement it. GDP does what it does well. The error is in asking it to do what it cannot do. A speedometer measures speed. It does not measure whether the driver is heading somewhere worth going, whether the fuel consumption is sustainable, or whether the passengers are comfortable. These are important questions. They require different instruments. And the absence of those instruments does not make the questions disappear. It makes them invisible — which is worse, because invisible problems accumulate without correction until they become crises.

The AI transition demands new instruments with an urgency that no previous technological shift has matched. The digital economy has been straining the measurement infrastructure for twenty years — free digital services, platform labor, the gig economy, intangible capital all challenge the national accounts in ways that statistical offices have been slow to address. But those challenges operated at the margins of a still-recognizable economy. The AI transition operates at the core. When the fundamental unit of economic production — human cognitive labor — is being amplified, redirected, intensified, and transformed by a tool that costs less than a mobile phone plan, the measurement system that was designed to count factory output during a war is not merely incomplete. It is describing a different economy than the one that actually exists.

Coyle's most recent work has addressed this directly. Her October 2025 essay "Measuring AI's Economic Impact" argued that we still lack convincing evidence that AI is transforming the global economy, not because the transformation is not happening, but because the metrics cannot detect it. The productivity statistics show no clear AI effect because the measurement infrastructure was designed for an economy in which output was physical, labor was hourly, and quality was approximated by price. In an economy where output is increasingly cognitive, labor is measured in attention rather than hours, and quality is multidimensional in ways that price cannot capture, the measurement system is looking for the wrong signal in the wrong place.

The dashboard is still working. It is still accurate. It is still measuring exactly what it was designed to measure. The problem is that what it was designed to measure is no longer what matters most.

Building better instruments is not an academic exercise. It is a prerequisite for governance. The policy decisions that will determine whether the AI transition produces broad-based flourishing or concentrated extraction are being made now, and they are being made on the basis of metrics that cannot distinguish between the two outcomes. A policy apparatus that sees only productivity growth will respond with policies that maximize productivity — deregulation, acceleration, the removal of institutional friction. A policy apparatus that could also see human capital depletion, quality erosion, and wellbeing decline might respond differently. It might build the structures that Segal calls dams — not to stop the river, but to channel it toward life.

The first step toward building those structures is understanding what the current dashboard can and cannot show. The current dashboard shows production. It does not show the sustainability of the systems that produce it. And the gap between those two measurements is where the most consequential effects of the AI transition live.

Chapter 2: The Productivity Paradox Returns

In 1987, the economist Robert Solow looked at the American economy and made an observation that would haunt his profession for a decade: "You can see the computer age everywhere but in the productivity statistics."

The observation was empirically precise and theoretically alarming. American businesses had spent hundreds of billions of dollars on information technology since the 1970s. Computers sat on every desk. Software had transformed accounting, inventory management, communications, design. The economy had clearly changed. But the aggregate productivity data — the measure that should have captured the efficiency gains all this investment was supposed to produce — showed nothing. Productivity growth had been declining since the early 1970s and showed no sign of reversing.

The paradox consumed a generation of economists. The explanations multiplied: measurement lags, implementation delays, the need for complementary organizational investments, the concentration of gains in IT-producing sectors rather than IT-consuming sectors. The resolution, when it finally came in the mid-1990s, was itself instructive. Productivity growth did accelerate, but only after a generation of organizational learning — the slow, expensive, often painful process of restructuring businesses around the capabilities that the technology made possible. The technology was necessary but not sufficient. The organizational transformation was what converted the investment into productivity. And the organizational transformation took time that no technology could compress.

Diane Coyle's analysis of the Solow paradox, developed across multiple works, emphasizes a dimension that most discussions overlook: the paradox was partly a measurement artifact. The productivity statistics were designed to measure the output of factories — physical goods produced per hour of labor. When the economy shifted toward services, toward information, toward the intangible, the statistics kept measuring what they had always measured. The gains from information technology appeared not in the form of more widgets per hour but in the form of better decisions, faster information flows, more complex coordination — outputs that the measurement system could not see because it was looking for physical throughput and finding, correctly, that physical throughput had not increased.

The AI transition presents a new productivity paradox, but the direction has reversed. In 1987, the paradox was that productivity gains failed to appear despite massive investment. In 2026, productivity gains are appearing with spectacular speed — the twenty-fold multiplier, the solo developer building revenue-generating products, the engineering teams shipping in days what previously required months. The new paradox is that the gains may not be measuring what they appear to measure.

The traditional productivity metric divides output by labor input. When output doubles and hours remain constant, productivity appears to double. The metric is simple, tractable, and almost universally employed. It is also blind to a distinction that the AI transition makes critical: the distinction between efficiency and intensity.

An efficiency gain means producing the same output with less effort. The process improves. The cognitive load per unit of output decreases. The worker goes home at the same time, having accomplished more. This kind of gain is sustainable. It can be maintained indefinitely because it does not deplete the human capital it depends on.

An intensity gain means producing more output with the same hours but more effort per hour. The cognitive load increases. The concentration deepens. The decision density rises. Every minute contains more thinking, more evaluation, more creative expenditure than the minute before AI arrived. This kind of gain is not sustainable — at least not without the recovery structures that Segal calls dams and that the Berkeley researchers documented as largely absent from AI-augmented workplaces.

The productivity metric cannot distinguish between these two sources of improvement. A worker who produces twice as much because the process became more efficient and a worker who produces twice as much because she is thinking twice as hard look identical in the statistics. The metric sees only the ratio. The mechanism is invisible.

This blindness has always been present. It has rarely mattered as much as it matters now. In an industrial setting, the intensity of labor was partially self-limiting. A factory worker operating at unsustainable intensity made errors, damaged equipment, suffered visible physical exhaustion. The limits were embodied. In a knowledge economy augmented by AI, the limits are cognitive and therefore invisible — invisible to the manager, invisible to the metric, often invisible to the worker herself until the depletion manifests as burnout, reduced judgment quality, or the flat affect that the Berkeley researchers documented in AI-augmented workers after sustained high-intensity periods.

The Berkeley study that Segal discusses in The Orange Pill — Xingqi Maggie Ye and Aruna Ranganathan's eight-month embedded observation at a two-hundred-person technology company — provides the most granular empirical evidence available for the intensity mechanism. The researchers found that AI tools did not reduce work. They intensified it. Workers who adopted AI took on more tasks, expanded into adjacent roles, and filled previously protected cognitive pauses with additional AI-assisted activity. The phenomenon the researchers called "task seepage" — the tendency for AI-accelerated work to colonize lunch breaks, elevator rides, and the micro-pauses that had informally served as cognitive recovery time — is a direct observation of intensity masquerading as efficiency.

The productivity metric would register the Berkeley findings as an unambiguous positive. More output per worker. More tasks completed. More domains covered. The metric would show a productive workforce becoming more productive. What it could not show was that the productivity was being purchased with cognitive reserves that were not being replenished — that the fuel gauge was dropping while the speedometer climbed.

Coyle's framework for understanding this pattern draws on her analysis of what she has called the "productivity J-curve" — the observation that transformative technologies typically produce a period of apparent productivity decline before the gains materialize, because the organizational restructuring the technology requires temporarily reduces measured output even as the capabilities expand. The J-curve explanation was part of the resolution of the original Solow paradox: productivity appeared to stall because firms were investing in reorganization whose benefits had not yet materialized.

The AI transition may be producing a different distortion: a productivity peak that conceals an unsustainable intensity. The gains are real but partially borrowed from the future — drawn from cognitive reserves, health reserves, relational reserves that will need to be repaid. The productivity J-curve described a temporary dip followed by sustainable gains. The AI intensity pattern may describe a temporary surge followed by a correction — a burnout wave, a quality decline, a withdrawal of the human engagement that the intensity depends upon.

Coyle's most recent empirical work, co-authored with Jörden and Poquiz and published as a 2025 working paper, investigated the determinants of firms' decisions to adopt AI. The findings reinforce the intensity hypothesis from a different angle. The binding constraint on AI adoption, the researchers found, was not the technology itself but the cost of organizational restructuring — the workflows that needed redesigning, the roles that needed redefining, the management practices that needed updating. Firms that adopted AI without restructuring their organizations produced the intensity pattern: more output from the same workers at higher cognitive cost. Firms that invested in organizational change — slower to show gains, costlier in the short term — were more likely to produce the efficiency pattern: genuine process improvements that sustained themselves over time.

The policy implication is substantial and uncomfortable. A productivity metric that cannot distinguish efficiency from intensity will reward the wrong pattern. It will celebrate firms that extract more from their workers and overlook firms that invest in sustainable transformation. It will present the burnout factory and the well-managed organization as equally productive, because the statistic that separates them — the sustainability of the working pattern — does not exist in any reporting framework that feeds into national accounts.

Coyle has argued, in her Stanford Digital Economy Lab white paper on measuring AI, that the next step is to treat AI-enabled information as an input to a knowledge production function within an endogenous growth framework. The technical language conceals a radical proposal: that we need to measure not just what AI produces but what it consumes — including the human cognitive inputs whose depletion is invisible to every metric currently in use.

This is not a call for perfect measurement. Perfect measurement of cognitive intensity is no more achievable than perfect measurement of environmental externalities. But approximate measurement — time-use surveys adapted to capture cognitive load, workplace wellbeing indicators integrated into productivity reporting, longitudinal studies that track the sustainability of AI-augmented working patterns over years rather than quarters — would provide information that the current system does not provide at all. The distance between no measurement and approximate measurement is larger, and more consequential, than the distance between approximate measurement and perfect measurement.

The original productivity paradox was resolved by waiting. The gains eventually materialized in the statistics because the organizational transformation eventually occurred. The new productivity paradox cannot be resolved by waiting, because the question is not whether the gains will appear — they have already appeared — but whether the gains are sustainable. And sustainability is a property that can only be assessed over time, which means that by the time the measurement system detects the problem, the damage may already be compounding.

The Solow paradox taught economists that measurement systems designed for one economy could fail to capture the dynamics of another. The AI productivity paradox is teaching the same lesson again, but with higher stakes and less time to learn it.

Chapter 3: The Invisible Surplus

In standard economic theory, consumer surplus is the difference between what a person is willing to pay for something and what they actually pay. When a commuter values a train ride at ten pounds but the fare is three, the seven-pound difference is consumer surplus — real value, genuinely experienced, entirely invisible to GDP. The national accounts record the three pounds that changed hands. The seven pounds of value that the commuter received but did not pay for does not exist in the statistics.

Consumer surplus has always been large. It has always been unmeasured. And for most of economic history, the omission has been tolerable, because the relationship between market transactions and total value was roughly stable. If consumer surplus ran at some relatively constant multiple of market value, then tracking market value alone provided a reasonable proxy for the direction, if not the magnitude, of welfare changes.

The digital economy shattered this stability. When a service is free — when the price is zero — consumer surplus is not merely large relative to the price. It is the entirety of the value. The price captures nothing. A person who uses Google Search fifty times a day, deriving substantial informational value from each query, generates zero GDP contribution through that usage. The advertising revenue Google earns appears in the accounts, but the consumer's experience — the value of having questions answered, routes calculated, translations performed, information retrieved — is invisible. Erik Brynjolfsson and his collaborators have estimated that the consumer surplus from free digital goods in the United States alone may amount to hundreds of billions of dollars annually. None of it appears in GDP.

Diane Coyle has engaged with this problem more persistently than perhaps any other economist of her generation. Her work on the measurement of the digital economy — running through GDP: A Brief but Affectionate History, through her contributions to the Bean Review of UK economic statistics, through her Cogs and Monsters analysis of the digital economy's measurement challenges — returns again and again to the same structural problem: the national accounting framework was built on the assumption that value is revealed through price, and the digital economy produces an increasing share of its value at a price of zero.

The AI transition does not merely extend this problem. It transforms its scale and character in ways that demand entirely new analytical tools.

Consider what happens when the language interface makes every knowledge worker a potential software developer. Segal describes this capability expansion throughout The Orange Pill — engineers building across disciplinary boundaries, non-technical founders prototyping products, a marketing manager constructing a custom tracking tool in an evening. Each of these acts is an act of production. Real tools are being built. Real value is being created. Real problems are being solved. And virtually none of this production enters the measured economy.

The marketing manager did not purchase her tracking tool from a software vendor. She built it herself, using a tool that costs a hundred dollars a month. The value she created — better decisions, saved time, improved outcomes — accrues to her and to her organization, but it never generates the market transaction that the national accounts require in order to see it. If she had purchased equivalent software from a SaaS provider at a thousand dollars per month, GDP would register twelve thousand dollars of annual economic activity. Because she built it herself, GDP registers approximately twelve hundred dollars — the cost of the AI subscription — and misses the remaining value entirely.

Now multiply this by every knowledge worker who uses AI to build personal tools, automate workflows, generate analyses, draft documents, create presentations, design interfaces, or solve problems that they would previously have hired someone else to solve or, more likely, simply left unsolved. The aggregate value is potentially enormous. It is also structurally invisible, because it occurs within what economists call household production — the category of economic activity that takes place outside markets and therefore outside the measurement system.

Coyle has framed the broader digital surplus problem by asking a deceptively simple question: is data more like oil, air, fish, or wine? Each analogy implies a different economics. Oil is rival and depletable — my use diminishes yours. Air is non-rival and non-excludable — a public good that markets systematically underprovide. Fish are rival but renewable — requiring management to prevent depletion. Wine improves with age but requires investment in production. The answer matters because it determines the appropriate regulatory and measurement framework. Her conclusion, developed in her Daedalus essay on socializing data, is that data — and by extension, the AI systems trained on it — exhibits characteristics of all four, which means that no single existing framework is adequate.

The AI surplus problem inherits this complexity and adds a new dimension: the surplus is generated not merely by consuming a digital service but by producing with it. The consumer surplus from Google Search is passive — the user receives value by querying a system that already exists. The AI production surplus is active — the user creates value by directing a system that amplifies their capability. The active surplus is larger per interaction, more heterogeneous in form, and even harder to measure, because each act of AI-assisted production generates a unique output whose value depends on context, intent, and use.

The measurement challenge is compounded by the phenomenon of AI-enabled capability expansion. Before AI, a backend engineer who wanted to build a user interface needed to acquire frontend skills — a process that took months or years of learning, itself a form of investment that the national accounts did not capture. With AI, the same engineer can build an interface in days by describing what she wants in natural language and iterating on the result. The capability expansion is real: she can now do things she could not do before. But the value of the expansion has no price, generates no transaction, and produces no statistical shadow.

Segal frames this as the imagination-to-artifact ratio — the distance between a human idea and its realization — collapsing toward zero. From a measurement perspective, this collapse is the most significant event since the invention of free digital services, because it means that the production boundary of the economy — the line that separates what is counted from what is not — is being redrawn in real time, by millions of individual decisions, without anyone updating the statistical framework to reflect the change.

The conceptual precedent is the imputation that statistical offices already perform for owner-occupied housing. A homeowner who lives in her own house is consuming a housing service that she also provides to herself. The national accounts estimate the rental value of this service and include it in GDP — an imputation that acknowledges the production even though no transaction occurs. The logic is sound: the housing service is real regardless of whether it passes through a market. The same logic applies to AI-assisted personal production. The custom tool the marketing manager builds is a real tool regardless of whether she bought it or made it. But no statistical office currently imputes the value of AI-assisted personal production, because the category barely existed before 2025 and the methodologies for estimating its value have not been developed.

Coyle's February 2026 essay "AI Will Transform Business, Not Just Jobs" points toward the institutional dimension of this measurement gap. She argues that AI is fundamentally an information technology that affects decision-making processes, and that its impact will manifest through corporate reorganization rather than simple task automation. The implication for measurement is significant: if AI's primary value is in improving decisions rather than increasing output, then the national accounts — which measure output — will systematically undercount the value. Better decisions produce better outcomes, but the quality improvement in decisions is even harder to measure than the quality improvement in products. No statistical framework currently attempts it.

The invisible surplus also complicates the distributional analysis that Coyle considers essential. When AI tools are available at low cost to anyone with an internet connection, the consumer surplus they generate is potentially progressive — a developer in Lagos receives the same capability amplification as an engineer in San Francisco, at the same price, which means the surplus relative to what the tool replaces is far larger for the developer in Lagos. This is the democratization of capability that Segal celebrates, and the celebration is warranted. But the distributional benefit is invisible to every metric that policymakers currently use to evaluate inequality, because the metric measures income and wealth — market outcomes — not capability and surplus — the full value of what people can do and be.

A measurement system that could see the invisible surplus would tell a fundamentally different story about the AI transition than the one currently available. It would show that the transition is generating enormous value, much of it outside markets, much of it accruing to individuals and small organizations that the current statistics cannot see. It would show that the distributional effects may be more progressive than income data suggests, because the capability expansion reaches people who were previously excluded from the production process entirely. And it would show that the measured economy — the one that appears in GDP, in productivity statistics, in the quarterly reports that drive policy — is an increasingly incomplete representation of the actual economy that people experience.

The invisible surplus is not a rounding error. It is potentially the largest single economic effect of the AI transition, and it is entirely outside the frame of the dashboard that every government consults. Building instruments that can detect it — even approximately, even imperfectly — is not an academic luxury. It is a prerequisite for understanding what is actually happening.

Chapter 4: Household Production and the Missing Account

In 1988, the New Zealand politician and scholar Marilyn Waring published If Women Counted, a book that posed a question so fundamental that it embarrassed an entire discipline: Why does the system that measures economic activity systematically exclude the work that sustains human life?

The answer was structural, not conspiratorial. National income accounting was designed by men, in the 1930s and 1940s, to measure the market economy — specifically, the industrial market economy that produced the goods and services governments needed to track during depression and war. Household production — the cooking, cleaning, childcare, eldercare, emotional labor, and domestic management that kept families functioning and workers capable of showing up at the factory — was excluded not because anyone argued it was valueless but because it was not transacted through markets and therefore had no price that the accounting system could use to value it. The omission was technical in origin. Its consequences were political in effect. What the accounts did not measure, governments did not discuss. What governments did not discuss, budgets did not fund. And the work that budgets did not fund was, disproportionately, work performed by women.

Waring's critique launched a generation of feminist economic scholarship and catalyzed a series of institutional responses — satellite accounts for household production, time-use surveys, imputation methodologies — that have improved the situation without resolving it. Household production remains outside the headline GDP figures in every country. The satellite accounts that estimate its value are produced irregularly, receive minimal policy attention, and are treated as supplementary information rather than core economic intelligence. The structural omission persists because the institutional infrastructure persists. GDP remains the dashboard. The dashboard remains blind to household production. And the blindness has consequences.

Diane Coyle's engagement with this blindness has been sustained and technically precise. Her work on measurement reform consistently identifies the household production gap as one of the most consequential omissions in the national accounts — not because other economists disagree but because agreement has not produced change. The case for measuring household production has been won intellectually and lost institutionally, which is a pattern Coyle recognizes as characteristic of measurement reform more broadly: the arguments for better metrics are strong, but the infrastructure that keeps the old metrics in place is stronger.

The AI transition transforms the household production gap from a chronic measurement failure into an acute crisis.

The mechanism is straightforward, and Segal identifies it without naming it in economic terms. When AI tools make productive work supernormally stimulating — when the flow state that Segal describes is available on demand, when the imagination-to-artifact ratio collapses, when the conversation with the machine is more engaging than any conversation available at the dinner table — the opportunity cost of household production increases. Every hour spent cooking, cleaning, attending to a child's homework, or maintaining the emotional infrastructure of a family is an hour not spent in the exhilarating, productive, measurable flow of AI-augmented work.

The viral Substack post — "Help! My Husband is Addicted to Claude Code" — is a domestic production crisis expressed in the language of a relationship complaint. When the husband withdraws from household responsibilities to build with Claude, two simultaneous transactions occur in the economy of the household. His market-visible productivity increases: he is shipping more, building more, creating more economic value as conventionally measured. His household production decreases: meals not cooked, children not attended to, emotional labor not performed, the invisible maintenance of shared life not maintained.

The national accounts register the first transaction and ignore the second. GDP sees a more productive worker. It does not see a depleted household. And because the depletion is invisible to the metric, it is invisible to the policy apparatus — which means that the policies designed to support the AI transition will systematically favor market production over household production, will reward the worker who sacrifices domestic engagement for productivity, and will provide no institutional support for the partner who absorbs the costs.

Time-use surveys provide the empirical foundation for quantifying this shift. The OECD's harmonized time-use data shows that across developed economies, adults spend between three and five hours per day on unpaid household work, with women consistently performing approximately twice as much as men. The aggregate value of this production, estimated using replacement cost methodologies — what would it cost to hire someone to perform the same tasks? — ranges from twenty to forty percent of GDP depending on the country and the methodology.

These are not marginal figures. They represent a substantial share of the total economic activity that sustains a society. And the AI transition is exerting a systematic pressure on this production — not by replacing it with machines (household robots remain primitive) but by increasing the attractiveness of the alternative. The pull is cognitive, not economic. The husband in the Substack post is not choosing market work over household work because the market work pays more. He is choosing it because it is more engaging. The flow state that AI-augmented work produces is, for many users, more psychologically rewarding than the friction-rich, repetition-heavy, often invisible work of domestic maintenance.

The Berkeley study documents the mechanism through which this displacement operates. The researchers observed "task seepage" — AI-augmented work colonizing lunch breaks, commutes, elevator rides, and the micro-pauses that had informally served as transition time between professional and domestic modes. When the boundary between work and not-work erodes, the household production that occupied the not-work time does not simply relocate. It competes for attention with a tool that is specifically designed to be more engaging than whatever you were doing before you opened it.

Coyle's framework suggests that the displacement follows a predictable economic logic. When the returns to market production increase — as they do when AI amplifies capability — and the returns to household production remain constant — as they do when household technology has not changed — rational actors will reallocate time from household to market production. The reallocation is individually rational. It may be socially destructive. And the metrics that guide social policy cannot see it happening because they were designed to measure only one side of the ledger.

The destruction is not hypothetical. The research on the relationship between household production and child development is extensive and largely unambiguous: the quality and quantity of parental engagement in the first years of life is among the strongest predictors of long-term cognitive and emotional outcomes. The household production that AI displacement erodes is not merely inconvenient housework. It includes the sustained, patient, friction-rich engagement with children that developmental psychology identifies as foundational. When Segal asks — in what is perhaps the most emotionally charged passage in The Orange Pill — "What am I for?" and imagines a twelve-year-old asking the same question, the answer from developmental science is precise: you are for the sustained presence that no machine can provide and that no metric currently measures.

The irony is structural. The AI transition that enables unprecedented productive capability simultaneously undermines the domestic conditions that produce the human capital on which productive capability depends. The children whose parents are too absorbed in AI-augmented work to provide sustained attention will be less capable, less emotionally regulated, less cognitively resilient — and therefore less able to exercise the judgment and creativity that the AI economy prizes above all other skills. The metric that celebrates the parents' productivity today is blind to the human capital deficit it is creating for tomorrow.

Coyle has advocated for integrating household production into the core national accounts rather than relegating it to satellite accounts that no policymaker reads. The technical challenges are real — household production has no market price, and the imputation methodologies are contested — but the policy consequences of the omission are larger than the measurement difficulties. A national accounting system that included household production would show the AI transition differently. It would show the trade-off between market productivity and domestic production. It would make visible the displacement that the current metrics conceal. And it would provide the informational foundation for policies that support domestic production rather than merely celebrating market production.

Such policies are not exotic. Paid parental leave, subsidized childcare, flexible working arrangements, and the cultural normalization of domestic engagement as economically valuable work all represent institutional responses to the household production problem. Several countries — notably the Nordic nations — have implemented versions of these policies with measurable positive effects on both child development and women's labor force participation. But the policies are under constant political pressure, because the GDP metric that dominates the policy conversation does not count the production these policies protect. A government that invests in paid parental leave produces no visible GDP gain from the household production that the leave enables. The investment looks like a cost. The benefit is invisible.

The AI transition makes this policy architecture more urgent and more fragile simultaneously. More urgent because the displacement pressure is intensifying. More fragile because the fiscal pressure to invest in AI infrastructure — digital networks, training programs, R&D subsidies — competes with the fiscal resources available for domestic production support. A government guided by the GDP dashboard will prioritize the investments that produce visible GDP growth. Domestic production support will lose, because its benefits are invisible to the metric that dominates the decision.

Coyle's argument is not that GDP should be abandoned. It is that the policy conversation cannot be conducted with one eye closed. The AI transition produces gains that the dashboard can see and costs that it cannot. The gains will be celebrated. The costs will be ignored. And the gap between what is celebrated and what is ignored will be borne, as it has always been borne, by the people whose work the metrics were never designed to see.

Chapter 5: The Quality Adjustment Problem

The Bureau of Labor Statistics employs several hundred people whose job is to decide whether a new car is better than last year's car, and if so, by how much.

The question sounds simple. The answer is among the hardest problems in applied economics. When a manufacturer adds a backup camera, improves fuel efficiency by three percent, and raises the sticker price by two thousand dollars, the statistician must decompose the price increase into two components: the portion that represents genuine inflation — paying more for the same thing — and the portion that represents quality improvement — paying more for a better thing. Get the decomposition wrong in one direction, and inflation is overstated. Get it wrong in the other, and real output growth is understated. The entire edifice of real GDP measurement rests on the accuracy of these quality adjustments, and the adjustments are performed through hedonic pricing models that estimate the implicit value of each product characteristic from observed market prices.

For cars, the methodology works tolerably well. Cars have measurable characteristics: horsepower, fuel economy, safety ratings, cargo space. The hedonic model can estimate how much consumers value each characteristic by observing how prices vary across models that differ in specific, identifiable ways. The estimates are imperfect — they assume stable preferences, linear valuation, and adequate market data — but they produce results that most economists consider reasonable approximations.

For software, the methodology strains. For AI-augmented cognitive output, it breaks entirely.

Diane Coyle has identified quality adjustment as one of the chronic weaknesses of the national accounting system, a weakness that compounds silently because the errors are invisible in the headline figures. When the Bureau of Labor Statistics underestimates the quality improvement in a new smartphone — counting as inflation what is actually a better product at a higher price — real GDP growth is understated. When it overestimates the quality improvement — counting as real output growth what is actually a price increase for a product that is only marginally better — real GDP growth is overstated. The errors do not announce themselves. They accumulate in the price deflators that convert nominal GDP into real GDP, and they propagate through every subsequent calculation that uses real GDP as an input: productivity growth, real wage growth, international competitiveness comparisons, the entire apparatus of macroeconomic assessment.

The AI transition supercharges the quality adjustment problem because it produces an economy in which the dominant form of output — cognitive work product — resists quality measurement at a fundamental level.

Consider the example that runs throughout The Orange Pill: an engineer who previously shipped one feature per sprint now ships ten. The productivity statistic registers a tenfold improvement. But what has actually happened to the quality of each feature? The honest answer is: nobody knows, and the measurement system has no way to find out. The features may be individually excellent — the AI may have handled the mechanical implementation so effectively that the engineer could devote her full attention to design, architecture, and user experience, producing ten features that are each as thoughtful as the single feature she would have produced without AI. Or the features may be individually adequate but shallow — competent implementations that pass functional testing but lack the architectural depth, the edge-case resilience, the design sensitivity that would have been present if a human being had struggled through each implementation manually, building understanding through friction.

Both scenarios produce the same productivity number. The metric cannot distinguish them. And the distinction matters enormously for the sustainability and value of the output.

Coyle's framework for understanding this problem draws on her analysis of intangible capital — the category of economic assets that includes software, design, organizational knowledge, brand equity, and human skill. Jonathan Haskel and Stian Westlake's Capitalism Without Capital documented the rising share of intangible investment in advanced economies and the systematic failure of the national accounts to measure it adequately. Intangible assets are difficult to measure because they are non-rival (my use does not diminish yours), because they generate spillovers (their benefits extend beyond the firm that created them), and because their value is context-dependent (a patent is worth millions in one market and nothing in another). These characteristics make market valuation unreliable and accounting valuation essentially arbitrary.

The AI transition intensifies every dimension of the intangible measurement problem. When Segal argues that the value has migrated from code to judgment — from the tangible artifact of software to the intangible capacity to decide what software should exist — he is describing, in a builder's language, the shift from tangible to intangible capital that Coyle and Haskel and Westlake have been documenting for a decade. Code is measurable: lines written, functions deployed, tests passed. Judgment is not measurable by any metric that currently exists. The architectural intuition that tells a senior engineer when a system will fail under load, the product sense that tells a designer when an interface is almost right but not quite, the strategic vision that tells a leader which of ten possible products deserves to exist — these are the qualities that the AI economy values most, and they are precisely the qualities that the measurement system cannot see.

The quality adjustment problem extends beyond individual products to the composition of economic output as a whole. When AI makes it cheap to produce competent work across a wide range of domains, the average quality of output in the economy may decline even as the total quantity increases. The aggregate statistics will show growth. The lived experience will be of a world saturated with adequate output and starved of excellent output. The measurement system that tracks only quantity will celebrate the saturation. The quality dimension — the distinction between adequate and excellent that Segal frames as the difference between breadth and depth — will be invisible.

This is not a theoretical concern. Coyle has documented analogous quality problems in the digital economy. The proliferation of free digital content — news articles, blog posts, video, music — has been accompanied by widespread concern about quality degradation. More content is being produced than at any point in human history. Whether the average quality has improved, declined, or remained stable is an empirical question that the measurement system cannot answer, because the measurement system does not track content quality. It tracks content quantity (through digital activity metrics) and content revenue (through advertising and subscription data). A million mediocre articles and a thousand brilliant ones produce the same aggregate activity statistics. The metric cannot distinguish them.

The AI economy will replicate this pattern at every level of economic activity. More code will be written, but will the average quality of code improve? More legal briefs will be drafted, but will the average quality of legal reasoning deepen? More medical diagnoses will be generated, but will the average quality of diagnostic thinking improve? More strategic plans will be produced, but will the average quality of strategic judgment sharpen? The optimistic scenario is that AI handles the mechanical dimension of each activity — the syntax of the code, the formatting of the brief, the pattern matching of the diagnosis — and frees the human to focus on the quality dimension: the architecture, the reasoning, the clinical judgment, the strategic vision. The pessimistic scenario is that AI produces adequate output so easily that the human stops investing in the quality dimension, because adequate is good enough and the market cannot tell the difference.

Which scenario prevails is an empirical question. The measurement system cannot currently answer it. And without an answer, the policy conversation proceeds as though the question does not exist.

Coyle has proposed that the measurement of AI's economic value should be embedded within an endogenous growth framework — a model in which the quality of knowledge inputs, not merely their quantity, determines the trajectory of output over time. In her Stanford Digital Economy Lab white paper on measuring AI, she argues that AI-enabled information should be treated as an input to a knowledge production function, which means that the quality of the information — the quality of the decisions, the quality of the judgment, the quality of the creative output — matters as much as the quantity. A measurement system designed around this framework would not merely count features shipped or code generated. It would attempt to assess the decision quality embedded in the output — a vastly harder measurement problem, but one whose difficulty does not make it less necessary.

The quality adjustment problem is not merely a technical challenge for statistical offices. It is a political problem with distributional consequences. When the measurement system cannot distinguish quality from quantity, it systematically favors producers of quantity over producers of quality. The firm that ships a hundred adequate features receives the same productivity credit as the firm that ships ten excellent ones. The policy apparatus that sees only the productivity numbers will reward the first firm and ignore the second. Over time, the economic incentives will shift toward quantity production, because quantity is what the metrics measure and what the metrics measure is what the system rewards.

This is the dynamic that Segal gestures toward when he describes the aesthetics of the smooth — the cultural tendency to mistake frictionless competence for genuine excellence. The quality adjustment problem is the measurement-system version of the same cultural tendency. When the metric cannot tell the difference between smooth and deep, it will reward smooth — because smooth is countable, scalable, and optimizable, and deep is none of these things.

Coyle's most uncomfortable conclusion, implied throughout her measurement work but rarely stated this bluntly, is that measurement systems are not neutral recording devices. They are incentive structures. What they measure, they reward. What they cannot measure, they penalize by neglect. A measurement system that counts output without assessing quality does not merely fail to capture quality. It actively discourages quality, because quality requires the kind of investment — time, attention, patience, struggle — that the output metric counts as cost rather than as value.

The AI transition, measured by the current system, will look like an unambiguous productivity triumph. Measured by a system that could also assess quality, it might look like something more complicated: an expansion of quantity accompanied by a compression of quality that the current metrics cannot detect, that policymakers therefore cannot discuss, and that the economy will not correct until the consequences — in the form of system failures, design shortcomings, or the slow erosion of the deep expertise that quality depends upon — become too visible to ignore.

By which point, the correction will be more expensive than the measurement would have been.

---

Chapter 6: When Free Is Not Free

The AI subscription costs one hundred dollars per month. The organizational transformation costs everything else.

This distinction — between the price of the tool and the cost of the transition — is the measurement error that most distorts the policy conversation about AI. The error is natural, almost inevitable, because the price of the tool is visible, quantifiable, and strikingly low. One hundred dollars per month for a twenty-fold productivity multiplier is, by any conventional cost-benefit calculation, the most attractive investment in the history of enterprise technology. The price anchors the conversation. The cost disappears behind it.

Diane Coyle's work on digital economics has persistently identified this conflation as one of the central analytical failures of the platform age. When a digital service is free to the user — as most social media platforms, search engines, and communication tools are — the price signals that conventional economics relies upon to infer value and allocate resources simply vanish. A service with a price of zero generates zero revenue per user in the national accounts. The value the user derives, which may be substantial, has no statistical representation. Policy constructed on the basis of these statistics therefore systematically undervalues the services that people use most and underestimates the welfare effects of changes to those services.

AI tools are not free in the way that social media is free. They carry a subscription price. But the price is so low relative to the value they produce — and more importantly, so low relative to the total cost of deploying them effectively — that it functions, in the policy conversation, as though the tools were essentially costless. A hundred dollars per month sounds like a rounding error in an enterprise budget. It invites the inference that the AI transition is cheap. The inference is wrong.

The cost of the transition includes at least four categories of expenditure that the subscription price does not capture, each of which is substantial, and none of which appears in the standard cost-benefit analyses that organizations and governments use to evaluate AI adoption.

The first is organizational restructuring. Coyle's 2025 working paper with Jörden and Poquiz identified reorganization costs as the primary barrier to AI adoption — not the technology cost, not the training cost, but the cost of redesigning workflows, redefining roles, restructuring teams, and overcoming the managerial inertia that resists changes to established processes. When Segal describes spending weeks in Trivandrum training his team to work with Claude Code, he is investing in organizational restructuring. The investment is real: it consumes leadership time, disrupts existing workflows, creates temporary productivity losses, and requires sustained attention over months, not a one-time deployment. But it appears nowhere in the cost calculation that the subscription price anchors. The hundred dollars per month is visible. The weeks of leadership time are invisible.

The research findings are consistent across the empirical literature on technology adoption. Coyle has frequently cited the historical parallel of electrification: when factories first adopted electric power, the firms that simply replaced steam engines with electric motors in the same factory layout saw modest productivity gains. The firms that redesigned their factories around the distributed power that electricity made possible — single-story layouts, assembly lines, decentralized workstations — saw transformative gains, but only after years of costly restructuring. The technology was cheap relative to the total investment. The organizational transformation was where the real cost, and the real value, resided.

The second category is human capital adaptation. When AI changes what workers do — shifting the valuable work from execution to judgment, from narrow specialization to integrative thinking — every affected worker must adapt. The adaptation requires learning new skills, developing new habits of mind, and, most costly of all, revising a professional identity that may have been decades in construction. The senior developer who built her career on deep expertise in a specific programming language discovers that the language itself is now handled by the tool. Her value has migrated from what she could write to what she could decide. That migration is psychologically expensive. It requires confronting the obsolescence of skills that once defined professional worth, developing new capabilities that feel uncertain and unfamiliar, and sustaining motivation through a transition period in which the old identity no longer fits and the new one has not yet solidified.

None of this appears as a cost in any organizational accounting system. It is experienced by the individual as anxiety, disorientation, and the particular grief of watching hard-won expertise lose market value. It is experienced by the organization as increased turnover, reduced engagement, and the quiet departure of experienced people who decide that the cost of adaptation exceeds the benefit of staying. And it is experienced by the economy as a transition cost that no metric aggregates and no policy addresses.

The third category is the educational system redesign that the transition requires. When AI shifts the valuable skills from execution to judgment, the educational systems that trained people for execution become misaligned with the economy they are supposed to serve. Segal argues in The Orange Pill that the emphasis must shift from teaching students to produce toward teaching them to evaluate, to question, to exercise the kind of integrative thinking that AI cannot replicate. The argument is sound. The cost of implementing it is enormous and almost entirely unaccounted for.

Redesigning curricula, retraining teachers, restructuring assessment systems, and — most fundamentally — changing the educational culture from one that rewards correct answers to one that rewards good questions: this is a multi-decade, multi-billion-dollar project in any country that takes it seriously. It does not appear in any cost-benefit analysis of AI adoption because it is treated as an externality — a cost imposed on society by a transition that the adopting firms did not create and do not bear. The firms capture the productivity gain. The educational system absorbs the adjustment cost. And the metric that the policy conversation relies on shows only the gain.

The fourth category is social safety net expansion. Technological transitions displace workers. This is historically established, empirically documented, and predictable in its broad contours even when the specific jobs affected are difficult to identify in advance. The AI transition will displace some workers, augment others, and create new roles that do not yet exist. The net employment effect is debated. The transitional cost — the income loss, retraining expense, and psychological burden borne by displaced workers during the adjustment period — is not debated. It is simply unaccounted for.

Coyle's argument, developed across her policy advisory work and her published scholarship, is that these four cost categories — organizational restructuring, human capital adaptation, educational redesign, and social safety net expansion — collectively dwarf the subscription cost of the AI tools themselves. The subscription cost is the visible tip of an iceberg whose submerged mass the policy conversation has not begun to assess.

This matters because policy is shaped by what is counted. A government that evaluates the AI transition by comparing tool costs to productivity gains will conclude that the transition is overwhelmingly positive — a small investment producing enormous returns. A government that includes the full cost of the transition — the organizational restructuring, the human capital adaptation, the educational redesign, the safety net expansion — would reach a more qualified conclusion: the transition is potentially positive, but the positivity is conditional on investments that are themselves costly, that take years to implement, and that compete for fiscal resources with the AI infrastructure investments that produce the visible productivity gains.

The measurement error is not academic. It produces a specific policy distortion: underinvestment in the human and institutional infrastructure that the transition requires. Governments that see only the tool cost will invest in digital infrastructure — broadband, data centers, AI research funding — because these investments produce visible, measurable returns in the metrics that dominate the policy conversation. They will underinvest in the complementary infrastructure — education reform, workforce transition support, organizational capability building — because the returns to these investments are invisible to the metrics, slower to materialize, and harder to attribute to any specific policy intervention.

Coyle has repeatedly called this the measurement trap: the tendency of policymakers to optimize for what they can measure at the expense of what they cannot. The trap is not the result of incompetence or malice. It is the structural consequence of conducting the policy conversation in a language that can express some costs and not others. The tool is cheap. The transition is expensive. But the language of the policy conversation can express the first fact and not the second, which means the second fact does not enter the deliberation, which means the policies that emerge systematically underestimate the investment required and overestimate the ease of the adjustment.

The price of the tool is one hundred dollars a month. The cost of the transition is a question that no metric currently in use can answer — and that no policy currently in force is designed to address.

---

Chapter 7: The Wellbeing Gap

The economist Richard Easterlin published a paper in 1974 that contained a finding so counterintuitive that economists have been arguing about it for fifty years: countries that grow richer over time do not, on average, become happier.

The finding was not that money does not matter. Within any given country at any given moment, people with higher incomes report higher life satisfaction than people with lower incomes. The finding was that the relationship between income and happiness operated differently across time than across individuals. When an entire society grew richer together, the average reported happiness did not rise proportionally. Sometimes it did not rise at all. The mechanism, Easterlin proposed, was adaptation and social comparison: people habituate to higher income levels and evaluate their wellbeing relative to their peers rather than in absolute terms. A rising tide lifts all boats, but if satisfaction depends on the height of your boat relative to your neighbor's, universal rising produces no net satisfaction gain.

The Easterlin paradox has been debated, refined, challenged, partially confirmed, and partially complicated by subsequent research. The broad conclusion — that GDP growth is a poor proxy for wellbeing improvement beyond a moderate income threshold — has survived fifty years of empirical scrutiny. More recent work by Stevenson and Wolfers has found a log-linear relationship between income and subjective wellbeing that holds at higher income levels, suggesting that Easterlin overstated the flattening effect. But even the revisionists concede that the relationship between income and wellbeing is weaker, more mediated, and more contingent than the equation GDP growth equals progress implies. The metric that dominates economic policy captures one input to human flourishing and misses most of the others.

Diane Coyle's engagement with the wellbeing measurement literature has been characteristically pragmatic. She has not argued that wellbeing should replace GDP. She has argued that wellbeing metrics should stand beside GDP as complementary indicators, providing the policy conversation with information that GDP alone cannot supply. The distinction is institutional rather than philosophical. Coyle is not a GDP abolitionist. She is a measurement pluralist: she believes that the economy is multidimensional and that a single metric, however well-constructed, cannot capture its full dimensionality. A dashboard that shows only productivity is like a medical chart that shows only heart rate — accurate as far as it goes, dangerously incomplete as a basis for treatment.

The AI transition threatens to widen the wellbeing gap — the distance between measured economic performance and experienced human flourishing — to a degree that previous technological transitions did not approach.

The mechanism is specific and empirically observable. AI tools produce what Segal describes as a compound experience: exhilaration and distress simultaneously. The builder in flow, constructing something extraordinary with Claude Code, experiences genuine creative satisfaction. The same builder, unable to stop, skipping meals, neglecting relationships, losing sleep, experiences genuine depletion. The two experiences are not sequential — first satisfaction, then depletion. They are concurrent. The same hour that produces the deepest creative engagement also produces the deepest cognitive drain.

The productivity metric registers the output of that hour. It does not and cannot register the psychological composition. It cannot distinguish an hour of sustainable flow — the state that Csikszentmihalyi documented, in which challenge and skill are matched, attention is absorbed, and the experience is intrinsically rewarding — from an hour of compulsive intensity, in which the engagement is driven not by satisfaction but by the inability to disengage. The observable behavior is identical. The experiential quality is opposite. And the experiential quality is what determines whether the working pattern is sustainable, which is what determines whether the productivity gain is real or borrowed.

The Berkeley study provides the empirical bridge between the individual experience and the aggregate measurement failure. The researchers documented not just increased intensity but decreased wellbeing: lower job satisfaction, reduced sense of autonomy, diminished empathy for colleagues, and what they described as a cognitive flattening — a reduction in the range and depth of emotional engagement with work that accompanied the increase in output. The workers were producing more and feeling worse. The productivity statistic captured the first. No statistic captured the second.

Coyle's framework for understanding this gap draws on Amartya Sen's capability approach, which she has incorporated into her measurement reform proposals. Sen's insight was that economic development should be evaluated not by what people produce or consume but by what they are able to do and be — their capabilities in the broadest sense. A person who produces enormous economic output but has no time for relationships, no capacity for leisure, no autonomy over the pace of their work, and no ability to disengage from production without psychological distress is not flourishing, regardless of what the output metric says. The capability approach evaluates an economy by asking whether its members have the freedom to live lives they have reason to value. The output metric evaluates an economy by asking whether its members are producing.

These are different questions. They produce different answers. And the gap between the answers is where the wellbeing deficit of the AI transition resides.

The wellbeing gap has a temporal dimension that the static metrics miss entirely. Coyle has noted that sustainability is a property that can only be assessed over time. A working pattern that produces extraordinary output for six months may be depleting the human capital that the output depends upon, producing what appears as productivity growth in the quarterly statistics and what manifests as burnout, turnover, and capability erosion in the annual data. By the time the depletion appears in the metrics — if it appears at all, which is not guaranteed given that the metrics do not track wellbeing — the damage is already compounding.

The temporal dimension also affects the relationship between AI-augmented work and the broader determinants of wellbeing. The psychological literature identifies several consistent predictors of life satisfaction: the quality of close relationships, a sense of autonomy and competence, engagement in meaningful activity, and adequate time for rest and recovery. AI-augmented work can enhance some of these — the sense of competence that comes from building something extraordinary, the engagement of creative flow — while eroding others, particularly relationship quality and rest. The net effect on wellbeing is not determinable from the output statistic. It requires information that no metric in the current policy toolkit provides.

Coyle's 2024 interview with Project Syndicate addressed this directly in the context of time-use data. She argued that time-use data must shape technological development — that understanding how people actually spend their time, and how technology changes that allocation, is prerequisite to evaluating whether the technology is producing genuine welfare gains or merely shifting time from unmeasured valuable activities to measured ones. The argument applies with particular force to the AI transition, where the reallocation of time from household production, rest, and relationship maintenance to AI-augmented market work represents a welfare transfer that the productivity statistics register as a pure gain.

The policy response that Coyle's framework implies is not the suppression of AI tools or the imposition of mandatory work-hour limits, though both have their advocates. It is the construction of a measurement infrastructure that provides policymakers with wellbeing information alongside productivity information — not as a replacement for the economic dashboard but as a complement that captures what the economic dashboard misses. Several governments have begun building such systems. The UK's Measuring National Well-being programme, New Zealand's Living Standards Framework, and Bhutan's Gross National Happiness Index represent early attempts. None of them has yet achieved the institutional weight of GDP — the integration into quarterly reporting cycles, central bank models, and political accountability structures that would make wellbeing data a genuine input to economic governance rather than a supplementary curiosity.

The AI transition may be the crisis that forces the institutional integration. When the dashboard shows a twenty-fold productivity improvement and the workforce reports simultaneous exhilaration and burnout, the gap between the metric and the experience becomes impossible to dismiss as a measurement technicality. It becomes a governance failure — a failure to provide decision-makers with the information they need to manage a transition whose most important effects are the effects the dashboard cannot show.

---

Chapter 8: Counting What Matters

No measurement system can capture everything. The question is not whether to omit. It is what to omit — and whether the omissions are deliberate, understood, and periodically reviewed, or whether they are inherited, unexamined, and allowed to compound until the metric describes a world that no longer corresponds to the one people actually inhabit.

Diane Coyle's career has been organized around a single institutional observation: the omissions in the national accounts are inherited, not chosen. Nobody decided, after careful deliberation, that household production should be excluded from GDP. Nobody evaluated the trade-offs and concluded that the measurement of wellbeing was less important than the measurement of output. The omissions were the product of historical circumstance — the specific emergency that called the national accounts into existence, the specific institutional contexts in which they were developed, the specific limitations of mid-twentieth-century data collection — and they have persisted not because anyone defends them on their merits but because the institutional infrastructure built around the existing metrics resists modification.

The resistance is structural, not ideological. Statistical offices have budgets, mandates, and workflows organized around the production of GDP and its component statistics. Changing what they measure requires changing their mandates, which requires legislative or executive action. It requires changing their budgets, which requires fiscal reallocation. It requires changing their methodologies, which requires years of development, testing, and international harmonization. It requires changing their workforce, which requires hiring people with skills that the current staff may not possess. And it requires changing the expectations of the policymakers, central bankers, and international organizations that consume their output — expectations that are calibrated to the existing metrics and that resist recalibration because recalibration introduces uncertainty into processes that prize certainty above almost everything else.

Coyle understands this resistance from the inside. Her work on the Bean Review — the independent review of UK economic statistics commissioned in 2015 — gave her direct experience of the institutional dynamics that govern measurement reform. The review recommended substantial changes to how the UK statistical system measured the digital economy, including better measurement of free digital services, platform-mediated transactions, and the quality of public services. The recommendations were technically sound, institutionally informed, and modestly ambitious by the standards of the academic literature. Their implementation has been slow, partial, and in some areas, stalled — not because anyone disagrees with the analysis but because the institutional machinery moves at a pace that technical arguments cannot accelerate.

The AI transition demands institutional acceleration. The gap between what the metrics measure and what the economy produces is widening at a pace that makes previous measurement failures — the undercounting of digital services, the mismeasurement of intangible capital — look like rounding errors. The capabilities that the AI economy values most — judgment, integrative thinking, the capacity to ask the right question — are invisible to the measurement system. The costs that the AI transition imposes most heavily — cognitive depletion, household production displacement, quality erosion, wellbeing decline — are invisible too. A policy conversation conducted in the language of the existing metrics is a conversation about a different economy than the one people are actually experiencing.

What would a better measurement system look like? Coyle's work, synthesized across her published scholarship and her institutional advisory roles, implies several specific reforms.

The first is the integration of time-use data into the core national accounting framework. Time-use surveys — which ask representative samples of the population to record how they spend every minute of every day — provide the empirical foundation for measuring household production, leisure, and the allocation of cognitive effort across activities. Several countries conduct time-use surveys periodically. None integrates the results into GDP reporting as a core element rather than a supplementary curiosity. In the AI economy, where the reallocation of time from unmeasured to measured activities is one of the most consequential effects of the transition, time-use data becomes essential — not supplementary — to understanding what is actually happening.

The second is the development of cognitive intensity metrics. The efficiency-versus-intensity distinction that Chapter 2 identified as the central unmeasured variable in the AI economy requires measurement tools that do not currently exist in any national statistical framework. Approximations are feasible: time-use surveys could be augmented with questions about cognitive load, stress, and engagement quality. Workplace surveys could track the sustainability indicators that the Berkeley researchers identified — task seepage, boundary erosion, recovery time. Longitudinal studies could follow cohorts of AI-augmented workers to assess whether productivity gains are sustained or whether they are followed by burnout and capability decline. None of these measures would be perfect. All of them would be better than the current situation, which is no measurement at all.

The third is the construction of quality-adjusted output measures for cognitive work. This is the hardest of the measurement reforms and the most important. As Chapter 5 argued, the productivity statistics that dominate the policy conversation count output quantity without assessing output quality, which means they cannot distinguish between genuine productivity improvement and mere quantity expansion at the expense of quality. Developing quality measures for cognitive output — for code, for legal analysis, for medical diagnosis, for strategic planning — is a research programme that will take years to mature. But the alternative to imperfect quality measurement is no quality measurement, and the policy consequences of no quality measurement are already visible: a system that rewards quantity over quality, that celebrates volume over depth, and that cannot detect the quality erosion that sustained AI-augmented production may be producing.

The fourth is the measurement of the invisible surplus. Chapter 3 argued that AI-assisted personal production generates an enormous consumer surplus that the national accounts cannot see. Measuring this surplus requires methodologies that economists have been developing for digital goods but have not yet applied to AI-augmented personal production. Willingness-to-pay surveys, conjoint analyses, and the GDP-B framework developed by Brynjolfsson and his collaborators provide starting points. Extending these methodologies to the AI context — where the surplus is generated through active production rather than passive consumption — is a measurement innovation that the statistical community has not yet undertaken.

The fifth, and most institutionally ambitious, is the integration of wellbeing metrics into the economic governance framework with the same institutional weight that GDP currently enjoys. This means not merely publishing wellbeing statistics — several countries already do this — but embedding them in the quarterly reporting cycles that drive political accountability, in the central bank models that drive monetary policy, and in the international comparison frameworks that drive development policy. A wellbeing metric that is published annually in a supplementary report and read by no one who makes decisions is a measurement without a governance function. A wellbeing metric that appears alongside GDP in the quarterly economic briefing, that central bank governors must address in their press conferences, that finance ministers must account for in their budget speeches — that is a metric with institutional power.

Coyle has been realistic about the pace of institutional reform. Her published work consistently acknowledges that measurement systems change slowly, that the institutional infrastructure resists modification, and that the gap between what academics propose and what statistical offices implement is measured in decades. But her work also consistently argues that the pace of reform is a choice, not a natural law. Governments that chose to build GDP reporting systems during the Second World War did so in years, not decades, because the urgency was clear and the political will was present. The AI transition presents an urgency that, if not equivalent to wartime mobilization, is sufficient to justify accelerated reform of the measurement infrastructure.

The structures that Segal describes — the practices and institutions that channel the transition's power without destroying the ecosystem it flows through — require information. Sound structures cannot be designed, implemented, or maintained without accurate information about the system they are meant to govern. The measurement infrastructure is itself a structure — one that determines what information flows to decision-makers and therefore what aspects of the transition are managed with awareness and which are managed blind. A measurement system that shows productivity growth while concealing human capital depletion, that shows output quantity while concealing quality erosion, that shows market production while concealing household displacement, is a structure that channels information toward celebration and away from caution.

Building better measurement systems is not a technical project that can be delegated to statistical offices and forgotten. It is a governance imperative that determines whether the AI transition is understood clearly enough to be managed wisely. The current metrics describe an economy that is booming. The unmeasured dimensions describe a transition that is far more complex, far more costly, and far more consequential than the boom alone would suggest. The gap between what is measured and what is real is where the policy failures of the next decade will originate — unless the measurement systems are reformed quickly enough to close it.

The dashboard needs new instruments. The instruments need institutional support. And the institutional support needs political will — the recognition, at the highest levels of governance, that what we measure determines what we manage, and that what we are currently measuring is no longer adequate to the economy we actually have.

Chapter 9: New Metrics for New Work

Hours worked is the oldest continuous measurement in the history of labor statistics. The factory clock, installed above the floor at the dawn of the industrial age, inaugurated a regime of temporal accounting that has outlasted every other feature of the industrial economy. The factories are gone, or transformed beyond recognition. The clock remains. And the metric it spawned — hours of labor as the denominator of productivity — still governs how governments, firms, and individuals evaluate economic performance.

The metric made sense when the relationship between time and output was relatively stable. A weaver who worked ten hours produced roughly twice as much cloth as one who worked five. A machinist's output scaled with hours at a rate that, while not perfectly linear, was close enough for the statistics to capture the essential dynamics. Hours was a reasonable proxy for effort, and effort was a reasonable proxy for cognitive and physical expenditure, and the ratio of output to expenditure was what productivity measured.

In AI-augmented knowledge work, every link in that chain has broken.

An hour of work with Claude Code contains a variable and often extreme range of cognitive activity. A developer in flow — iterating rapidly with the tool, making architectural decisions every few minutes, evaluating output, redirecting the conversation, holding the full context of a complex system in working memory while simultaneously making aesthetic judgments about user experience — is performing cognitive work at a density that has no precedent in the history of the occupation. The same developer, thirty minutes later, may be reviewing generated code with diminished attention, accepting output without evaluation, operating on cognitive fumes. Both segments register as one hour in the productivity denominator. The metric treats them as equivalent. They are not.

Diane Coyle's argument for measurement reform has always been grounded in institutional pragmatism rather than theoretical aspiration. She does not propose ideal metrics that no statistical office could implement. She proposes feasible improvements that existing institutions could adopt within realistic timeframes. The distinction is critical, because the history of measurement reform is littered with proposals that were intellectually unimpeachable and institutionally impossible — comprehensive wellbeing indices, full environmental accounting, complete measurement of intangible capital — that languished in working papers while the GDP dashboard continued to dominate policy without challenge.

The feasibility constraint shapes what Coyle's framework implies for measuring AI-augmented work. The proposals that follow are not utopian. They are extensions of existing methodologies to a new context — harder than current measurement, but not categorically different.

The first feasible reform is the adaptation of time-use surveys to capture cognitive intensity. Time-use surveys already exist in most OECD countries. They ask respondents to record their activities in fine-grained intervals — typically ten or fifteen minutes — across representative days. The surveys capture what people do with their time. They do not currently capture how intensely they do it. Adding intensity measures — self-reported cognitive load, stress, engagement quality, and perceived sustainability — to existing time-use survey instruments would provide, for the first time, a national-level dataset on the cognitive composition of working time. The data would not be perfect. Self-reported intensity is subject to recall bias, social desirability effects, and the fundamental difficulty of introspection about cognitive states. But approximate data on a critical variable is infinitely more useful than no data at all, and the marginal cost of adding intensity questions to existing surveys is modest relative to the value of the information.

The second is the development of decision-quality metrics for AI-augmented output. This is harder, but not impossible. In several domains — medical diagnosis, financial forecasting, engineering design — the quality of decisions can be evaluated retrospectively by comparing the decision to subsequent outcomes. A diagnostic algorithm can be assessed by tracking whether the diagnoses it informed proved correct. A financial model can be assessed by comparing its projections to actual results. An engineering design can be assessed by measuring the frequency and severity of failures in the deployed product. These retrospective quality assessments are already performed within individual organizations. What does not exist is an aggregate framework that compiles domain-specific quality assessments into a national indicator of decision quality in AI-augmented work. Building such a framework would require collaboration between statistical offices and domain-specific regulatory bodies — medical boards, engineering societies, financial regulators — that already collect quality data for their own purposes. The aggregation is the missing step, not the underlying measurement.

The third is the longitudinal tracking of cognitive sustainability. The Berkeley study that Segal discusses provided an eight-month window into the dynamics of AI-augmented work at a single firm. What the field lacks is longitudinal data — studies that follow cohorts of AI-augmented workers over years, tracking not just their output but their cognitive health, their career trajectories, their relationship quality, and their capacity to sustain the working patterns that the productivity metrics celebrate. The precedent exists in occupational health research, where longitudinal cohort studies have tracked the health effects of shift work, repetitive motion, and chemical exposure over decades. AI-augmented cognitive work is a new occupational exposure with potentially significant health effects. It merits the same longitudinal scrutiny.

Coyle's most recent empirical work points toward a fourth reform that is institutional rather than methodological. Her 2026 essay "AI Will Transform Business, Not Just Jobs" argued that AI's primary impact is on organizational structure — on how firms make decisions, allocate resources, and coordinate activity. If this is correct, then the appropriate unit of measurement is not the individual worker but the organization. Organizational productivity — the efficiency with which a firm converts inputs into valued outputs — is already measured, imperfectly, through firm-level surveys and financial data. What is not measured is organizational capability: the firm's capacity to make good decisions, to adapt to changing circumstances, to sustain its workforce, and to produce output whose quality justifies its existence.

Measuring organizational capability would require new survey instruments — questions about decision-making processes, workforce sustainability practices, quality management systems, and the balance between AI-augmented and human-directed work. The UK's Management and Expectations Survey, which Coyle has cited as a model for measuring organizational practices, provides a starting point. Extending it to capture AI-specific organizational capabilities — the structures that channel AI's productivity gains into sustainable performance rather than unsustainable extraction — would give policymakers information about the quality of the AI transition, not just its speed.

None of these reforms would produce perfect measurement. All of them would produce better information than currently exists. And the distance between current information — which is essentially zero on cognitive intensity, decision quality, cognitive sustainability, and organizational capability — and approximate information is where the largest policy gains reside.

Coyle has argued, with the quiet insistence of someone who has been making the same point for two decades, that the perfect should not be the enemy of the feasible. The national accounts were never perfect. The GDP figures that governments treat as definitive are revised multiple times, sometimes substantially, and the underlying data contains measurement errors, sampling biases, and conceptual ambiguities that statistical offices acknowledge in technical documentation that no policymaker reads. The standard for new metrics is not perfection. It is usefulness — the provision of information that improves decisions relative to a baseline of no information at all.

The baseline, for the variables that matter most in the AI transition, is precisely zero. No government currently measures the cognitive intensity of its workforce. No government measures the decision quality of its AI-augmented output. No government tracks the longitudinal cognitive health of workers whose occupational exposure has changed more dramatically in two years than in the previous fifty. And no government measures the organizational capability that determines whether AI's productivity gains are sustainable or extractive.

The metrics proposed here would not resolve the measurement gap that this book has documented. They would narrow it. And narrowing the gap — providing policymakers with even approximate information about the variables that determine whether the AI transition produces flourishing or depletion — is worth more, by orders of magnitude, than the precision-focused refinement of metrics that already capture what they were designed to capture while remaining blind to what they were not.

The measurement infrastructure is itself a form of governance infrastructure. What it can see, the political system can address. What it cannot see, the political system will neglect — not from malice, but from the structural impossibility of responding to information that does not exist. Building the instruments that make the invisible visible is not a technical project for statisticians. It is a precondition for democratic governance of the most consequential economic transformation since electrification.

---

Chapter 10: Beyond GDP in the Age of AI

The measurement system that Simon Kuznets built in 1934 was a response to a crisis. The United States government, confronting the worst economic collapse in modern history, discovered that it did not possess the basic informational infrastructure required to understand what was happening to the economy it was supposed to manage. The national income accounts were built, rapidly and under pressure, to fill that informational void. They succeeded. They succeeded so thoroughly that the framework Kuznets designed, refined by Richard Stone and standardized through the United Nations System of National Accounts, became the universal infrastructure of economic governance — adopted by virtually every country on earth, embedded in the decision-making processes of every major international institution, and so deeply woven into the fabric of economic policy that most practitioners have stopped noticing it, the way a fish stops noticing water.

Every major revision to the national accounting framework has been driven by crisis. The inclusion of government expenditure was driven by the Second World War and the need to measure military production. The inclusion of the service sector was driven by the shift from manufacturing to services that the old framework could not capture. The 2008 revision that reclassified research and development as investment rather than expenditure was driven by the growing recognition that the knowledge economy was undermeasured. In each case, the crisis revealed a gap between what the metrics showed and what the economy actually contained, and the gap forced institutional reform.

The AI transition is the next crisis. And its measurement gap is wider than any that preceded it.

Diane Coyle's work provides the analytical framework for understanding why. The gap is not a single omission but a compound failure — multiple measurement blindnesses interacting and reinforcing one another to produce a picture of the AI economy that is not merely incomplete but actively misleading. The productivity statistics overstate the sustainability of the gains by conflating efficiency with intensity. The output measures overstate the value of the gains by conflating quantity with quality. The national accounts understate the total production of the economy by excluding the enormous volume of AI-assisted personal production that occurs outside markets. The cost-benefit analyses understate the cost of the transition by capturing the price of the tools while missing the organizational, human capital, educational, and social safety net expenditures that the transition requires. And the wellbeing metrics — to the extent that they exist at all — are disconnected from the economic governance framework, which means that the human costs of the transition do not enter the deliberation that shapes the policy response.

Each of these failures has been documented in the preceding chapters. Their interaction is what makes the compound failure greater than the sum of its parts. A policymaker who receives a productivity report showing a twenty-fold improvement, an output report showing a surge in software production, a cost analysis showing a trivially low tool cost, and no wellbeing data at all will reach a conclusion that is internally consistent and externally wrong: the AI transition is cheap, productive, and beneficial, and the appropriate policy response is to accelerate adoption while minimizing regulatory friction.

The conclusion is wrong not because any individual metric is lying but because the ensemble of metrics is systematically biased toward the visible, the quantifiable, and the market-transacted, and against the invisible, the qualitative, and the human. The bias is structural, not intentional. But structural bias, compounded over time and applied to decisions that affect millions of lives, produces outcomes that are indistinguishable from intentional neglect.

Coyle has argued, most ambitiously in The Measure of Progress, that what is needed is not a single replacement for GDP but a dashboard — a small, carefully selected set of complementary metrics that together capture the multidimensional reality of an economy that GDP alone cannot describe. The dashboard metaphor is deliberate. A car dashboard does not show a single number. It shows speed, fuel level, engine temperature, oil pressure. No driver would accept a dashboard that showed only speed. The economic dashboard that shows only GDP is equally inadequate, and the fact that it has been tolerated for eighty years is a measure of institutional inertia, not analytical sufficiency.

What would the AI-era economic dashboard show? Coyle's framework, synthesized across her published work and applied to the specific challenges of the AI transition, implies at least six indicators.

The first is GDP itself — reformed to include satellite accounts for AI-assisted personal production, but retained as a measure of market output because market output remains important even if it is not the only thing that is important.

The second is a sustainability-adjusted productivity measure — one that distinguishes efficiency gains from intensity gains by incorporating data on cognitive load, working-pattern sustainability, and the longitudinal trajectory of workforce capability. A productivity number qualified by a sustainability indicator would tell policymakers not just how much the economy is producing but whether the production rate can be maintained without depleting the human capital it depends upon.

The third is a quality-adjusted output index — an imperfect but informative attempt to track whether the expanding quantity of AI-augmented output is accompanied by stable, improving, or declining quality. The index would necessarily be domain-specific — quality in software development is measured differently than quality in legal services or medical diagnosis — but the aggregate trend, even approximately measured, would provide information that currently does not exist.

The fourth is a distributional indicator — not merely of income, which existing metrics track, but of capability. Who has access to AI tools? Who is being augmented, and who is being displaced? Is the capability expansion reaching the populations that most need it, or is it concentrating among those who were already advantaged? Coyle's concern about AI monopolization — her call for a CERN for generative AI, her advocacy for interoperability principles borrowed from telecoms regulation — reflects the conviction that the distributional dimension of the transition is as important as the aggregate productivity dimension, and that the aggregate metrics conceal the distributional reality.

The fifth is a comprehensive time-use indicator — one that tracks how the population allocates its time across market work, household production, care, leisure, and rest, and how that allocation is changing in response to the AI transition. The indicator would make visible the household production displacement that Chapter 4 documented, the leisure compression that the Berkeley study identified, and the rest deficit that the wellbeing literature associates with sustained cognitive intensity.

The sixth is a wellbeing composite — drawing on life satisfaction, autonomy, relationship quality, sense of purpose, and the other dimensions that the psychological literature identifies as constitutive of human flourishing. The composite would stand alongside GDP in the quarterly reporting cycle, providing policymakers with information about whether the economy's human inhabitants are thriving or merely producing.

Six indicators. None perfect. None capturing the full complexity of an economy in transformation. But together, providing a picture that is categorically richer than the single number that currently dominates economic governance.

The institutional obstacles to building this dashboard are real. Statistical offices lack the mandates, the budgets, and in some cases the methodological tools to produce the new indicators. Central banks are calibrated to GDP data and resistant to incorporating unfamiliar metrics into their models. Political accountability cycles are synchronized to quarterly GDP reporting, and no politician has yet been voted out of office for neglecting a wellbeing composite. The institutional inertia is formidable.

But Coyle has noted that institutional inertia yields to crisis, and that the gap between what the metrics show and what people experience is itself a source of political crisis. When the dashboard shows economic success and the population reports economic anxiety, the gap undermines trust in institutions — trust in the statistical offices that produce the numbers, trust in the governments that cite the numbers, trust in the economic system that the numbers are supposed to describe. The erosion of institutional trust is itself measurable, and it has been accelerating across democratic societies for decades. Part of the erosion is attributable to the measurement gap: people know, from their own experience, that the economy described by the metrics is not the economy they inhabit. The metrics say things are getting better. The experience says things are getting more complicated. The divergence feeds the suspicion that the metrics are not merely incomplete but dishonest — that the numbers are designed to serve the interests of those who produce them rather than the interests of those they claim to describe.

Coyle has spent her career arguing that this suspicion is misdirected — that the metrics are not dishonest but limited, and that the limitation is addressable through institutional reform rather than institutional demolition. The argument is harder to sustain when the limitation is widening. The AI transition is producing an economy that diverges more dramatically from its statistical description than any previous economy. The divergence will erode institutional trust further unless the statistical description is reformed to capture what the economy actually contains.

The measurement system is governance infrastructure. It determines what information flows to decision-makers and therefore what aspects of the transition the governance system can see and respond to. A measurement system designed for the industrial economy of 1944, applied without fundamental reform to the AI economy of 2026, is governance infrastructure that has outlived the economy it was designed to govern. The infrastructure needs rebuilding. The rebuilding needs to happen now, not because perfect metrics are achievable but because approximate metrics are infinitely preferable to the systematic blindness that the current system produces.

Kuznets built the national accounts in response to a crisis that demanded better information. The AI transition is a crisis that demands the same. Whether the institutional response will match the urgency of the moment — whether the measurement infrastructure will be reformed quickly enough to provide the information that the governance of the transition requires — is not an economic question. It is a political one. And the answer will determine whether the most consequential economic transformation since electrification is managed with eyes open or eyes closed.

---

Epilogue

The number I cannot account for is my own.

Twenty engineers in Trivandrum, twenty-fold multiplier, one hundred dollars a month. I can cite those figures in my sleep. I have cited them on stages and in boardrooms and in conversations with investors who wanted the arithmetic to make a decision for them. The numbers are real. I measured them myself. And now, after spending months inside Diane Coyle's work, I realize that the measurement itself was the simplest part of what happened in that room — and the part that told me the least about what it actually meant.

Coyle's central insight is devastating in its plainness: what you count shapes what you value, and what you cannot count disappears from the conversation. I have lived inside that sentence for months, and I still find new rooms in it. When I reported the twenty-fold number, I was telling the truth. When the dashboard showed productivity surging, it was telling the truth. But the truth it told was the truth the dashboard was built to tell — output divided by hours — and everything that made the experience human was in the remainder that the division discarded.

The meals I skipped. The intensity that felt like flow until it didn't. The engineer who oscillated between excitement and terror for two days before finding his footing. The wife who wrote a Substack post because the man she married had vanished into a tool and the only language she had for the loss was a cry for help disguised as humor. None of that entered the denominator. None of it entered the numerator. None of it existed, statistically speaking. And because it did not exist statistically, it could not enter the policy conversation, could not inform the decisions that governments and companies and families were making about how to navigate what was happening.

Coyle forced me to see the dashboard I had been staring at as an artifact — something built by specific people in a specific emergency, carrying the assumptions of that emergency into a world it was never designed to describe. GDP was built to win a war. Productivity statistics were built to manage factories. Time-use surveys were designed before the concept of "task seepage" existed, before AI could colonize a lunch break, before the boundary between work and everything else dissolved into the width of a text message. The instruments are not wrong. They are measuring the economy that existed when they were calibrated. The economy has moved. The instruments have not.

What haunts me is the compound failure. Not any single blind spot but the way they reinforce each other. The productivity metric that cannot tell efficiency from intensity. The output measure that cannot tell quantity from quality. The national accounts that cannot see the household production evaporating behind the screen. The cost-benefit analysis that prices the tool at a hundred dollars and calls the transition cheap. Each blind spot is manageable in isolation. Together, they produce a picture of the AI transition that is systematically, structurally optimistic — a picture that shows the gains in high resolution and the costs in no resolution at all.

I am a builder. My instinct is to build. After reading Coyle, my instinct is still to build — but with a different awareness of what I am building on. The dashboard I consult every morning is a partial truth presented as the whole. The decisions I make on the basis of that dashboard are informed by what the dashboard can see and uninformed by what it cannot. And the people affected by those decisions — my team, my users, my family — experience the full reality that the dashboard reduces to a number.

I cannot fix the national accounts. That is work for institutions operating at scales I do not control. But I can stop pretending that the number tells the whole story. I can ask, every time I see a productivity figure, whether the gain is sustainable or borrowed. I can ask, every time I celebrate output, whether the quality earned the celebration. I can ask, every time I check the dashboard, what the dashboard cannot show me — and whether the invisible remainder is where the real story lives.

Coyle ended her career-spanning argument with a conviction I have come to share: measurement is governance. What we count, we manage. What we cannot count, we neglect. The AI transition will be governed by the metrics available to the people governing it. If those metrics show only the boom, the governance will be designed for a boom. If they could also show the cost — the depletion, the displacement, the quiet erosion of the things that make life worth accelerating toward — the governance might be designed for something more durable.

Build the instruments. That is the lesson I am taking from this. Not instead of building the products and the teams and the futures I believe in. Alongside them. Build the instruments that tell you what the dashboard cannot, and let the fuller picture guide the hand that holds the tools.

The dashboard is still on. The numbers still look extraordinary.

I am learning to read what lives between them.

Edo Segal

The AI revolution is producing numbers that look extraordinary on every metric we know how to read. Productivity is surging. Output is multiplying. The cost of the tools is trivially low. Every dashbo

The AI revolution is producing numbers that look extraordinary on every metric we know how to read. Productivity is surging. Output is multiplying. The cost of the tools is trivially low. Every dashboard in every boardroom says the same thing: this is working.

Diane Coyle has spent decades proving that the dashboards we rely on were built for a different economy -- one of factories and physical goods -- and that what they cannot see is often more consequential than what they can. In this volume, her frameworks reveal the measurement crisis hiding inside the AI boom: productivity gains that may be borrowed from cognitive reserves rather than earned through efficiency, an invisible surplus larger than the measured economy can detect, and household production quietly collapsing behind the screen.

When what you count determines what you govern, the unmeasured dimensions of the AI transition become the ungoverned ones. Coyle's work is the lens that shows us what the green lights are hiding.

Diane Coyle
“AI Will Transform Business, Not Just Jobs”
— Diane Coyle
0%
11 chapters
WIKI COMPANION

Diane Coyle — On AI

A reading-companion catalog of the 23 Orange Pill Wiki entries linked from this book — the people, ideas, works, and events that Diane Coyle — On AI uses as stepping stones for thinking through the AI revolution.

Open the Wiki Companion →