"Raw Data" Is an Oxymoron — Orange Pill Wiki
WORK

"Raw Data" Is an Oxymoron

Gitelman's edited 2013 MIT Press volume whose title became one of the most cited phrases in contemporary data studies — the founding text that established data as constructed, not given.

The volume gathered essays by historians of science, media theorists, and critical data studies scholars — including Daniel Rosenberg's etymological history of the term data, Geoffrey Bowker's analysis of database categories, and Paul Edwards's work on climate data infrastructures. The collection's central argument, encoded in its title, is that data is never raw: it is always cooked by the instruments that collect it, the institutions that commission it, the categories that organize it, and the assumptions that determine what counts as data in the first place. The volume's influence on critical data studies has been enormous; its title functions as a rallying phrase for scholars working to denaturalize the technology industry's presentation of data as pre-cultural and pre-institutional. In the AI age, the argument acquires a second life: the phrase AI-generated content performs the same sleight of hand the volume was designed to expose.

In the AI Story

Hedcut illustration for "Raw Data" Is an Oxymoron
"Raw Data" Is an Oxymoron

The volume was published at a moment when big data was emerging as a dominant framework for thinking about knowledge production. Gitelman's editorial intervention was to insist that scholars in the humanities and social sciences bring to the study of data the same critical-historical attention that had been developed for texts, images, and other cultural artifacts.

The contributors span disciplines. Rosenberg's chapter on the etymology of data establishes the linguistic ground — the Latin dare (to give) concealing the historical and material processes through which data is taken. Bowker's chapter on database categories demonstrates that the structures through which data is organized embed institutional assumptions that shape what can be asked of the data. Edwards's chapter on climate data shows that even measurements that appear to describe the physical world are shaped by the instruments and infrastructures that produce them.

Applied to AI, the volume's framework explains why AI training corpora cannot be treated as neutral samples of human knowledge. The corpora are cooked data — shaped by what was digitized, what was publicly available, what was written in English, what survived the filters of platform terms of service and copyright law. The outputs inherit the cooking.

The volume also establishes the methodological template this study extends. Gitelman's editorial framing treats data as an object of critical-historical analysis rather than as the raw material of such analysis. The same move applies to AI outputs: they are documents to be analyzed, not transparent reports on the world.

Origin

The volume emerged from conversations within the critical data studies community during the late 2000s, organized by Gitelman with input from her contributors. The title was Gitelman's; the framework was collaborative.

Key Ideas

Data is cooked. Every dataset bears the marks of the instruments, institutions, and categories that produced it.

Etymology as argument. The Latin roots of data conceal the historical processes through which data is actively extracted.

Categories embed interests. Database schemas, measurement protocols, and classification systems all encode institutional assumptions.

Template for AI. The argument extends directly to AI training data and outputs; both are cooked in ways the format conceals.

Political stakes. Denaturalizing data opens the question of whose interests the data-collection infrastructure serves.

Appears in the Orange Pill Cycle

Further reading

  1. Lisa Gitelman (ed.), "Raw Data" Is an Oxymoron (MIT Press, 2013).
  2. danah boyd and Kate Crawford, "Critical Questions for Big Data", Information, Communication & Society 15:5 (2012).
  3. Rob Kitchin, The Data Revolution (Sage, 2014).
Part of The Orange Pill Wiki · A reference companion to the Orange Pill Cycle.
0%
WORK