Data trusts are legal and institutional structures that hold data on behalf of the communities or individuals who generated it, making governance decisions about its use according to the beneficiaries' interests rather than according to the interests of the data harvester. They adapt the centuries-old legal form of the trust — in which a trustee holds property for the benefit of identified beneficiaries — to the conditions of the digital economy.
The specific urgency for AI arises from the training data problem. The large language models that generate contemporary AI capabilities were trained predominantly on data produced by billions of people writing, coding, and creating across the open internet. The value generated from this data — enormous, measured now in hundreds of billions of dollars of market capitalization — has flowed entirely to the corporations that harvested it. The people whose collective labor constitutes the training corpus received nothing.
Data trusts propose a structural correction. A trust