CONCEPT

Data Trusts

Institutional structures that hold and govern data on behalf of the communities that generated it — the distributive design response to AI's extraction of value from commons it does not own.

Data trusts are legal and institutional structures that hold data on behalf of the communities or individuals who generated it, making governance decisions about its use according to the beneficiaries' interests rather than according to the interests of the data harvester. They adapt the centuries-old legal form of the trust — in which a trustee holds property for the benefit of identified beneficiaries — to the conditions of the digital economy.

In the AI Story

Hedcut illustration for Data Trusts — Data Trusts

The specific urgency for AI arises from the training data problem. The large language models that generate contemporary AI capabilities were trained predominantly on data produced by billions of people writing, coding, and creating across the open internet. The value generated from this data — enormous, measured now in hundreds of billions of dollars of market capitalization — has flowed entirely to the corporations that harvested it. The people whose collective labor constitutes the training corpus received nothing.

Data trusts propose a structural correction. A trust holding creative-commons training data, for example, could license that data to AI companies on terms that return a portion of the value to a beneficiary community — perhaps to fund public-interest AI development, or to support creators displaced by AI-generated alternatives. The trust's trustees, governed by beneficiary representatives, would make licensing decisions according to the community's interests rather than solely according to commercial maximization.

The legal infrastructure for data trusts remains under development. The Open Data Institute in the UK has produced significant design work. The concept has drawn support from scholars including Jack Balkin (the "information fiduciary" framework) and Sylvie Delacroix. Implementation has been piloted in health data (UK NHS), geographic data (various municipalities), and environmental monitoring. Scaling to AI training data would require legal innovation and likely regulatory support.

For Raworth's distributive design, data trusts address one of the most structural inequities in the AI economy: the private capture of publicly produced value. The commons did not consent to becoming corporate training data. Data trusts are one institutional mechanism for converting that capture into something closer to a royalty or licensing relationship that returns value to the source.

Origin

The trust form has medieval English origins in property law. The application to data emerged in the 2010s, with major theoretical development by Jack Balkin (Yale), Sylvie Delacroix (Oxford), and the Open Data Institute. The concept has since been taken up by the European Commission, Japanese data regulators, and various civil society organizations.

Key Ideas

Trust structure. A centuries-old legal form adapted to data governance, with trustees holding property for beneficiaries.

Training data problem. Data trusts address the unconsented extraction of collective intellectual labor into private AI training corpora.

Implementation pilots. Health, geographic, and environmental applications are underway; AI training data is the frontier application.

Regulatory dependence. Full-scale implementation likely requires legal and regulatory innovation.

Appears in the Orange Pill Cycle

Kate Raworth — On AI

Data Trusts

In the AI Story

Origin

Key Ideas

Appears in the Orange Pill Cycle

Related Entries

Further reading