The Imitation Game (Collins) — Orange Pill Wiki
CONCEPT

The Imitation Game (Collins)

Collins's domain-specific variant of Turing's original test — a methodology for evaluating expertise by asking whether a non-expert (or a machine) can be distinguished from a genuine expert by judges with contributory expertise in the specific field.

The Imitation Game is Collins's methodological instrument for operationalizing the distinction between interactional and contributory expertise. Where Turing's original test asked whether a machine could fool a general audience in open-ended conversation, Collins's version asks the more discriminating question: can the machine (or the aspirant to interactional expertise) fool specialists in the target domain? Collins used the methodology to validate his own interactional expertise in gravitational wave physics — a panel of judges could not reliably distinguish his answers from those of actual physicists — and proposed it as the proper test for evaluating AI competence in specific fields.

In the AI Story

Hedcut illustration for The Imitation Game (Collins)
The Imitation Game (Collins)

The methodological innovation is the shift from general to domain-specific evaluation. General Turing Tests systematically underestimate the mimeomorphic sophistication of modern language models — LLMs can fool most general audiences on most topics — while systematically overestimating their contributory competence. The specialist Imitation Game cuts the other way: it reveals where fluent surface competence fails against expert judgment. The failures are informative because they identify the precise boundary between what the machine has absorbed from textual training and what requires the collective tacit knowledge of domain participation.

The methodology has a second virtue: it generates empirically testable claims about where AI systems will and will not succeed. Collins's framework predicts that machines will reliably pass the Imitation Game in domains where the relevant knowledge is predominantly relational (captured in training text) and reliably fail in domains where the relevant knowledge is predominantly collective (maintained in social practice). The Collins and Thorne 2026 paper tested this directly, finding that language models could not reproduce the specific forms of social reasoning that gravitational wave physicists use when evaluating fringe science claims — a failure that is invisible in general evaluations but consistent across specialist ones.

Origin

Collins developed the methodology across Rethinking Expertise (2007) with Robert Evans and subsequent empirical papers. The technique formalized insights from his own experience as a non-practitioner interactional expert in gravitational wave physics, subjected to tests in which physicists attempted (and failed) to identify him as an outsider.

Key Ideas

Domain-specific. The test must be conducted by judges with genuine contributory expertise in the target field.

Not the general Turing Test. General tests conflate mimeomorphic sophistication with contributory competence; the Imitation Game separates them.

Empirically productive. The methodology generates testable predictions about where AI systems will succeed and fail.

Diagnostic, not binary. The Game does not simply pass or fail a system; it identifies the specific kinds of questions where the boundary between interactional and contributory expertise becomes visible.

Appears in the Orange Pill Cycle

Further reading

  1. Harry Collins and Robert Evans, Rethinking Expertise (University of Chicago Press, 2007)
  2. Collins, Evans, Ribeiro, and Hall, 'Experiments with interactional expertise' (Studies in History and Philosophy of Science, 2006)
  3. Harry Collins and Simon Thorne, 'Can LLMs reason like physicists?' (2026)
Part of The Orange Pill Wiki · A reference companion to the Orange Pill Cycle.
0%
CONCEPT