You On AI Encyclopedia · The Enclosure of the Training Commons The You On AI Encyclopedia Home
Txt Low Med High
CONCEPT

The Enclosure of the Training Commons

The appropriation of commons-produced knowledge — Wikipedia articles, open-source code, Creative Commons works — as training data for proprietary AI models, extracting value from shared resources while privatizing the resulting capabilities, creating a parasitic dynamic that undermines the incentive structure sustaining the commons.
AI companies trained their language models on the accumulated output of commons-based peer production: Wikipedia's 60 million articles, billions of lines of open-source code, Creative Commons–licensed cultural works, and publicly available research. This training data was freely accessible because communities of contributors had shared it under open licenses, creating a commons of knowledge and expression. The resulting AI models are overwhelmingly proprietary — owned by the companies that trained them, accessed through commercial APIs, governed unilaterally by corporate boards. The commons fed the machine, and the machine's outputs are privatized. This represents enclosure in Benkler's framework: the conversion of a shared resource into private property, extraction without reciprocity, and the disruption of the contribution ecology.
The Enclosure of the Training Commons
The Enclosure of the Training Commons

In The You On AI Encyclopedia

The enclosure operates at a structural level that existing intellectual property frameworks were not designed to address. Open-source licenses govern the use of specific software artifacts but say nothing about using those artifacts as training data. Creative Commons licenses govern reproduction and adaptation of specific works but do not govern the statistical patterns extracted from millions of such works and embedded in neural network weights. The legal vacuum has allowed AI companies to consume the commons without the reciprocity obligations that commons governance traditionally imposed — no requirement to contribute improvements back, no community voice in governance, no sharing of the value generated by the models.

The dynamic is circular and self-reinforcing. AI systems trained on commons data enable individual direct production. Individual producers, who can meet their needs through solitary AI conversation, contribute less to the commons. The commons receives less data and less community engagement. Future AI models train on a degraded commons, supplemented by proprietary data or AI-generated synthetic data. The commons becomes peripheral to the AI ecosystem, further reducing the incentive to maintain it. This is not the tragedy of overgrazing — the commons is not depleted by use. It is the tragedy of underfeeding — the commons degrades because contributions decline when the social context motivating contribution (community recognition, collaborative governance) is eliminated by a technology that makes collaboration unnecessary.

Training Data as Public Good
Training Data as Public Good

Benkler's institutional response to enclosure has always emphasized legal frameworks that protect the commons: open licenses, robust fair use, resistance to copyright expansion. The AI moment requires extending this institutional repertoire. Possible frameworks include copyleft-for-AI licenses (requiring that models trained on commons data be released openly), commons-governed AI (open-source models developed and maintained by communities), and compensation mechanisms (channeling revenue from commercial AI to commons maintenance). None exist at scale, and the institutional vacuum is the governance crisis of the transition.

Origin

The concept draws on the historical analysis of enclosure — the 18th-century privatization of English common lands that displaced rural communities — which Benkler invoked in The Wealth of Networks to describe the danger facing the digital commons. The AI-specific application emerged in 2023–2024 legal battles (authors' guilds suing OpenAI, artists suing Stability AI) and in the recognition that the training data fueling AI capabilities was overwhelmingly drawn from commons-produced knowledge that received no compensation, credit, or governance rights.

Key Ideas

Commons as substrate. AI capability rests entirely on the accumulated output of commons-based peer production, making the commons the unacknowledged foundation of the AI economy.

Extraction without reciprocity. AI companies consumed commons data and produced proprietary tools that compete with the commons for contributors, disrupting the ecology of motivation that sustained collaborative production.

Enclosure Movement
Enclosure Movement

Model collapse risk. If AI-generated content floods the training data of future models, the diversity and human grounding that characterized the original commons will degrade, reducing the quality of subsequent AI generations.

Institutional gap. Existing intellectual property frameworks do not govern the use of works as training data, creating a legal vacuum that must be filled through new licensing frameworks, governance structures, or compensation mechanisms.

Debates & Critiques

Defenders of current AI training practices argue that learning from publicly available text is analogous to human reading and that fair use protections should apply. Critics argue that the scale of AI training (billions of documents) and the commercial nature of the resulting products distinguish it from individual human learning. Benkler's framework cuts through this debate by focusing on institutional design: the question is not whether training is legal under existing law, but what legal and governance arrangements will best serve the production and circulation of knowledge in the AI era.

Further Reading

  1. Yochai Benkler, The Wealth of Networks, Chapter 11 (Yale University Press, 2006)
  2. Elinor Ostrom, Governing the Commons (Cambridge University Press, 1990)
  3. James Boyle, The Public Domain (Yale University Press, 2008)
  4. Pamela Samuelson, 'Generative AI Meets Copyright' (Science, 2023)
  5. Mark Lemley and Bryan Casey, 'Fair Learning' (Texas Law Review, 2021)

Three Positions on The Enclosure of the Training Commons

From Chapter 15 — how the Boulder, the Believer, and the Beaver each read this concept
Boulder · Refusal
Han's diagnosis
The Boulder sees in The Enclosure of the Training Commons evidence of the pathology — that refusal, not adaptation, is the correct posture. The garden, the analog life, the smartphone that is not bought.
Believer · Flow
Riding the current
The Believer sees The Enclosure of the Training Commons as the river's direction — lean in. Trust that the technium, as Kevin Kelly argues, wants what life wants. Resistance is fear, not wisdom.
Beaver · Stewardship
Building dams
The Beaver sees The Enclosure of the Training Commons as an opportunity for construction. Neither refuse nor surrender — build the institutional, attentional, and craft governors that shape the river around the things worth preserving.

Read Chapter 15 in the book →

Explore more
Browse the full You On AI Encyclopedia — over 8,500 entries
← Home 0%
CONCEPT Book →