CONCEPT
The Training Data Question
The governance regime change in which the accumulated textual, visual, and computational output of millions of individuals was appropriated for AI training under terms their original contribution did not contemplate — the paradigmatic case of commons appropriation without community participation.
The training data from which
large language models learn constitutes, in institutional-economic terms, a commons: the accumulated textual, visual, and computational output of millions of individuals, contributed without explicit governance arrangements for this purpose to a shared pool from which value is now extracted by a small number of firms. The governance arrangements under which the data was originally contributed — the norms of the open internet, the terms of service of social platforms, the licensing frameworks of academic publishing — were designed for a world in which the data's primary use was human consumption. The
appropriation of that data for AI training represents what
Ostrom's framework identifies as a regime change in the commons.
In The You On AI Field Guide
The appropriation was undertaken without the participation of the community whose contributions constitute the resource. The herders did not consent to having the pasture enclosed. The fishers did