CONCEPT

The Archive (Groys)

Groys's term for the institutionally maintained totality of what a culture has recognized as valuable — a structure with its own conservative logic, its own politics of inclusion and exclusion, and now the submedial space beneath every AI output.

The archive, in Groys's analytical vocabulary, is not a neutral repository of cultural products. It is an institution with its own logic, maintained by curators, librarians, editors, critics, and the distributed apparatus of cultural valuation. The archive preserves certain objects and discards others. It organizes its contents through categories — author, genre, period, medium — that are themselves cultural constructions reflecting the priorities of the civilizations that maintain them. The archive is the measure against which novelty is evaluated, and it is the hidden depth beneath every large language model's polished output. Understanding AI requires understanding the archive, because the machine does not think — it processes the archive and reproduces its structural patterns, including its biases and exclusions.

In the AI Story

Hedcut illustration for The Archive (Groys) — The Archive (Groys)

The archive has a conservative logic: each new addition must differ from what it already contains, because the archive does not need two of anything. A second Impressionism is not innovation; it is redundancy. This structural conservatism is what makes the archive capable of producing the category of the new. Without the archive's conservatism, there would be no pressure toward differentiation and no mechanism for distinguishing innovation from reproduction. The paradox Groys identifies is that the institution most responsible for preserving the past is also the institution most responsible for generating the future — because the future, as a cultural category, exists only as the gap between what the archive contains and what it has not yet absorbed.

AI training transforms the archive from a living institution into a static resource. The human curator engages with the archive — evaluates its contents, questions its construction, makes judgments about what to preserve and what to reconsider. The algorithm exploits the archive. It extracts patterns without evaluation, produces outputs without judgment, reproduces biases without awareness. This is not a moral criticism. It is a structural description. Algorithms do what algorithms do. But the consequence is that the archive's biases — its overrepresentation of commercially successful texts, of institutionally prestigious sources, of English-language materials, of the perspectives of those with the resources to publish — become encoded in the statistical structure of the model and reproduced in every polished paragraph the model generates.

The submedial space of an AI output is the training archive. The visible surface is the prose. The hidden depth is the billions of tokens from which the patterns were extracted. The user who engages only with the surface is in the position of the museum visitor who admires the painting without asking what the museum chose not to exhibit. The exclusion is constitutive. The value of what is shown depends on the invisibility of what is not shown. And the smoothness of the AI output, like the white walls of the gallery, is designed to direct attention away from the archive that produced it.

The practical consequence for AI literacy is severe. Critical engagement with AI output requires not just technical literacy but archival literacy: the capacity to ask what archive produced this output, what its known biases are, whose voices are overrepresented, whose perspectives are systematically excluded. These are the questions humanists have been trained to ask about every cultural product. They are the questions the smooth surface of AI output systematically discourages, because the smooth surface presents itself as the natural expression of intelligence rather than as the contingent product of a specific, biased, historically shaped archive.

Origin

Groys's theory of the archive developed across three decades of engagement with the institutional history of art, culminating in Art Power (2008) and In the Flow (2016). The framework draws on Foucault's earlier analysis of the archive as the system of statements that determines what can be said, but Groys extends the analysis beyond discourse into the material infrastructure of cultural preservation: the museum, the library, the gallery, the catalog.

Key Ideas

Archives are institutions, not repositories. What the archive contains reflects the decisions of its maintainers, and those decisions carry political, economic, and aesthetic priorities that shape what counts as valuable.

Archives are conservative by structure. Each new addition must differ from the existing contents, which is what allows the archive to generate the category of novelty.

AI algorithmically exploits the archive. The model extracts statistical patterns without the evaluative engagement that characterized human curation, reproducing biases without awareness.

Archival literacy is the new critical literacy. Evaluating AI output requires asking what archive produced it, what the archive excludes, and what the exclusions constitute.

Debates & Critiques

The question of whether AI training archives can be curated — whether the biases can be corrected through deliberate selection of training data — divides contemporary AI ethics. Groys's framework suggests the problem is deeper than curation can reach: the archive is always political, and the pretense that a neutral archive can be constructed reproduces the fantasy of an Archimedean point outside cultural history.

Appears in the Orange Pill Cycle

Boris Groys — On AI