The methodological innovation is the shift from general to domain-specific evaluation. General Turing Tests systematically underestimate the mimeomorphic sophistication of modern language models — LLMs can fool most general audiences on most topics — while systematically overestimating their contributory competence. The specialist Imitation Game cuts the other way: it reveals where fluent surface competence fails against expert judgment. The failures are informative because they identify the precise boundary between what the machine has absorbed from textual training and what requires the collective tacit knowledge of domain participation.
The methodology has a second virtue: it generates empirically testable claims about where AI systems will and will not succeed. Collins's framework predicts that machines will reliably pass the Imitation Game in domains where the relevant knowledge is predominantly relational (captured in training text) and reliably fail in domains where the relevant knowledge is predominantly collective (maintained in social practice). The Collins and Thorne 2026 paper tested this directly, finding that language models could not reproduce the specific forms of social reasoning that gravitational wave physicists use when evaluating fringe science claims — a failure that is invisible in general evaluations but consistent across specialist ones.
Collins developed the methodology across Rethinking Expertise (2007) with Robert Evans and subsequent empirical papers. The technique formalized insights from his own experience as a non-practitioner interactional expert in gravitational wave physics, subjected to tests in which physicists attempted (and failed) to identify him as an outsider.
Domain-specific. The test must be conducted by judges with genuine contributory expertise in the target field.
Not the general Turing Test. General tests conflate mimeomorphic sophistication with contributory competence; the Imitation Game separates them.
Empirically productive. The methodology generates testable predictions about where AI systems will succeed and fail.
Diagnostic, not binary. The Game does not simply pass or fail a system; it identifies the specific kinds of questions where the boundary between interactional and contributory expertise becomes visible.