Data Sovereignty — Orange Pill Wiki
CONCEPT

Data Sovereignty

The principle — advanced by indigenous movements and developed within Escobar's pluriversal framework — that communities retain governance rights over the data, knowledge, and cultural expressions that AI systems extract from them as raw material.

Data sovereignty names a specific gap in the current AI architecture. The training data on which models are built includes knowledge produced by communities that did not consent to its inclusion, do not benefit from its use, and exercise no governance over the systems that incorporate it. Indigenous ecological knowledge, traditional cultural expressions, locally produced content in languages whose speakers the models' governance structures do not include — all of this enters the training pipeline as raw material rather than as intellectual contribution. Data sovereignty frameworks would establish that communal knowledge is a contribution entitled to recognition, compensation, and governance rights.

In the AI Story

Hedcut illustration for Data Sovereignty
Data Sovereignty

The concept emerged most forcefully from indigenous movements, particularly the First Nations Information Governance Centre in Canada, which developed the OCAP principles (Ownership, Control, Access, and Possession) for indigenous data governance. Te Mana Raraunga in Aotearoa/New Zealand has developed a parallel framework for Māori data sovereignty. The CARE Principles (Collective benefit, Authority to control, Responsibility, Ethics), developed by the Global Indigenous Data Alliance, explicitly complement the FAIR Principles (Findable, Accessible, Interoperable, Reusable) by asserting that data practices must serve communities and respect their self-determination.

The relevance to AI is direct. Large language models are trained on corpora that include vast quantities of community-generated content: indigenous language documentation, traditional knowledge in oral histories transcribed by anthropologists, cultural expressions published in journals and books without community consent. The aggregation of this material into training data does not obliterate the underlying communities' interests in how their knowledge is represented and used. But the current AI architecture has no mechanism for recognizing those interests — no process for consent, no framework for compensation, no governance structure for ongoing oversight.

Escobar's framework positions data sovereignty as an essential component of pluriversal AI, not an optional addition. If AI systems encode the epistemology of their training data, then the question of who controls the training data is the question of which worlds the system sustains. A data sovereignty framework that recognizes communal knowledge as a contribution entitled to governance rights would shift the structure of AI development fundamentally: from extraction of raw material to collaboration with contributing communities, from one-way distribution of tools to two-way negotiation of terms.

The obstacles are substantial but not insurmountable. Precedents exist in other domains: geographical indications protecting the provenance of traditional products, the Nagoya Protocol governing access to genetic resources, intellectual property frameworks for traditional knowledge developed by WIPO. The extension of these instruments to the AI domain is technically feasible and has been proposed by researchers and activists across the Global South. It is politically absent because the current distribution of power in AI development does not require it.

Origin

The concept emerged primarily from indigenous movements, particularly First Nations data governance in Canada (OCAP principles, 1998), Māori data sovereignty movements in Aotearoa, and the Global Indigenous Data Alliance's CARE Principles (2019).

Escobar has incorporated data sovereignty into his pluriversal framework, particularly in his recent work on digital technology and AI governance.

Key Ideas

Communal contribution, not raw material. The knowledge used to train AI systems is produced by communities with ongoing interests in its use.

OCAP and CARE principles. Ownership, Control, Access, Possession — and Collective benefit, Authority to control, Responsibility, Ethics — constitute developed frameworks applicable to AI.

Existing precedents. Geographical indications, the Nagoya Protocol, and traditional knowledge frameworks demonstrate the feasibility of governance structures.

Not optional. Data sovereignty is not an ethical add-on but a structural requirement for pluriversal AI.

Political obstacle, not technical. The current absence of data sovereignty reflects the distribution of power, not the unavailability of mechanisms.

Appears in the Orange Pill Cycle

Further reading

  1. Stephanie Russo Carroll et al., 'The CARE Principles for Indigenous Data Governance,' Data Science Journal 19 (2020).
  2. Tahu Kukutai and John Taylor (eds.), Indigenous Data Sovereignty: Toward an Agenda (ANU Press, 2016).
  3. Maggie Walter et al. (eds.), Indigenous Data Sovereignty and Policy (Routledge, 2020).
  4. First Nations Information Governance Centre, The First Nations Principles of OCAP (2014).
Part of The Orange Pill Wiki · A reference companion to the Orange Pill Cycle.
0%
CONCEPT