CONCEPT

The Training Data Question

The governance regime change in which the accumulated textual, visual, and computational output of millions of individuals was appropriated for AI training under terms their original contribution did not contemplate — the paradigmatic case of commons appropriation without community participation.

The training data from which large language models learn constitutes, in institutional-economic terms, a commons: the accumulated textual, visual, and computational output of millions of individuals, contributed without explicit governance arrangements for this purpose to a shared pool from which value is now extracted by a small number of firms. The governance arrangements under which the data was originally contributed — the norms of the open internet, the terms of service of social platforms, the licensing frameworks of academic publishing — were designed for a world in which the data's primary use was human consumption. The appropriation of that data for AI training represents what Ostrom's framework identifies as a regime change in the commons.

In The You On AI Field Guide

The appropriation was undertaken without the participation of the community whose contributions constitute the resource. The herders did not consent to having the pasture enclosed. The fishers did

In The You On AI Field Guide

Keep reading with YOU ON AI