CONCEPT

Externalities in AI Training

The costs imposed on creators when their work is used to train AI models without compensation — a Coasian property-rights problem with no current institutional solution.

Large language models are trained on billions of documents, images, and code samples, much produced by individuals and organizations who were neither compensated for the use nor asked for consent. The AI companies and consumers benefit from the training-produced capabilities; the original creators bear a cost as their work builds competing capability potentially reducing their future output's market value. Under current property-rights assignment, this cost falls entirely on creators. The legal framework governing training-data use is unsettled, and transaction costs of enforcing copyright against AI training are prohibitive — the scale is measured in billions of documents, causal connections between individual documents and model outputs are diffuse and hard to establish, and legal frameworks were designed for reproduction, not statistical learning. The Coasian question is not whether creators deserve compensation but whether a different rights assignment would produce more efficient outcomes.

In The You On AI Encyclopedia

Coase's 1960 "The Problem of Social Cost" argued that externality problems are fundamentally about property rights assignment rather than market failure. When a factory pollutes a river harming downstream fishermen, the question is not whether pollution should be prohibited but who holds the right — the factory's right to pollute or the fishermen's right to clean water. If rights are clear and transaction costs low, parties can negotiate efficient outcomes regardless of initial assignment. The policy implication is institutional design: assign rights clearly, reduce transaction costs, enable negotiation. The AI training externality fits this framework precisely: the costs are real, they fall on identifiable parties (creators), and rights are ambiguous or undefined.

The current assignment implicitly favors AI companies. Publicly available data is treated as fair game for training under a capacious interpretation of fair use that courts have not definitively settled. Creators hold copyright to specific expressions but have no clear right to compensation for training use, no practical mechanism for opting out at scale, and no effective remedy after the fact. Transaction costs of asserting rights are prohibitive — individual creators cannot afford to sue, class actions face standing and causation difficulties, and the aggregation problem (millions of works contributing fractionally to a single model) makes per-work compensation schemes administratively infeasible.

A different assignment might create explicit training-data rights, requiring AI companies to compensate creators or secure consent. The Coasian analysis asks whether this would produce better outcomes accounting for all costs. Benefits: creators receive compensation, incentive to produce high-quality work is maintained, the positive externality of deep expertise as training data is internalized. Costs: AI services become more expensive (reducing access), compliance and administrative overhead, reduced model capability if training corpora shrink. The comparison is empirical, not ideological, and requires knowledge of actual costs that current institutional uncertainty prevents anyone from possessing fully.

Origin

The externality became visible with large-scale scraping for AI training beginning around 2018–2020. The Authors Guild letter of July 2023, signed by over ten thousand authors, publicly framed the issue as unauthorized use of copyrighted work. The Andersen v. Stability AI lawsuit (January 2023) by visual artists brought the question into federal court. The Coasian framing — treating this as a property-rights assignment problem requiring institutional design rather than as a moral dispute requiring judicial pronouncement — emerged in legal and economic analysis through 2024–2025 as courts, legislators, and scholars recognized that the existing framework was inadequate to the scale and structure of the conflict.

Key Ideas

Externality as rights-assignment failure. The costs creators bear are not inherent to AI technology but consequences of current property-rights allocation — a different assignment would produce different outcomes.

Transaction costs block negotiation. Even if rights were clear, the scale (billions of works), diffusion (fractional contributions), and information asymmetry (creators don't know which models used their work) make Coasian bargaining practically impossible.

Long-run knowledge degradation. If AI economies cease rewarding deep expertise investment, the next generation of experts will be smaller and less skilled, degrading the training data available for future models — a negative feedback loop operating across time.

Institutional design, not judicial resolution. The Coasian solution is not determining who is "right" but building mechanisms (compensation schemes, opt-in/opt-out infrastructure, data trusts) that internalize costs and maintain incentives efficiently.

Explore more

Browse the full You On AI Encyclopedia — over 8,500 entries