The training data differs from other AI inputs in a consequential way: it is not consumed when used. A text in the corpus is not destroyed by training on it. But the quality of the corpus can degrade — if the systems that produce high-transformity intellectual work are undermined by the very technology that depends on them, the intellectual topsoil thins.
This creates a dependency loop the AI discourse has not adequately examined. The model's capability depends on training data. The training data depends on human intellectual production. Human intellectual production depends on educational institutions, research infrastructure, cultural traditions of deep inquiry, and the economic conditions that sustain all of these. If AI deployment degrades any of these conditions — eroding educational depth by making answers cheap, undermining research incentives by commodifying intellectual output, displacing the economic structures that fund universities — then it degrades the quality of its own future training data.
The agricultural analogy is precise. Industrial agriculture depletes soil faster than soil regenerates; yields hold for a generation through fertilizer subsidies, then collapse when the soil structure fails. The intellectual topsoil analogy applied to training data makes the same prediction: current models train on a corpus produced overwhelmingly by humans working through friction-rich processes in functioning institutions. The question is what the next generation trains on, and the generation after that.
Researchers have already documented the increasing prevalence of AI-generated text in academic submissions, web content, and technical documentation. Each increment shifts the composition of the training pool toward lower-transformity content. The model collapse literature examines what happens when models train substantially on their own outputs. Transformity analysis reveals the deeper problem: the intellectual reserves funding the current generation of AI are nonrenewable at the timescale of consumption.
The concept extends Odum's emergy framework to informational inputs, following his 1973 placement of information processing at the apex of the energy hierarchy. The explicit application to AI training data is developed in this volume and related work applying systems ecology to the computational economy.
The agricultural analogy — topsoil depletion as a template for understanding the degradation of slowly-accumulated reserves under fast extraction — has roots in the Dust Bowl era's awakening about industrial farming practices and their civilizational consequences.
Each text is a chain endpoint. The emergy of any single training text traces through education, institutions, cultural traditions, and agricultural surplus extending to the Neolithic.
Aggregate emergy is staggering. The full corpus embodies civilizational investment on scales that dwarf conventional economic accounting.
Use without destruction, quality with depletion. Training data is not consumed, but the institutions producing high-quality future data can be eroded by the technology it feeds.
Dependency loop runs backward. AI depends on training data, which depends on institutional health, which AI can undermine.
Topsoil thins slowly, then fails fast. The agricultural analogy predicts degradation that is imperceptible for a generation, then catastrophic.