WORK

Collins and Thorne (2026)

Collins and Simon Thorne's 2026 arXiv paper testing large language models against the specific discourse practices of gravitational wave physicists — the empirical demonstration that LLMs cannot reproduce reasoning dependent on collective tacit knowledge acquired through social participation.

The 2026 paper represents Collins's most direct empirical engagement with the AI moment. Working with Simon Thorne, Collins tested whether large language models could reproduce the reasoning that gravitational wave physicists use when evaluating fringe science claims — specifically, the community's practice of deciding to ignore a particular unconventional paper without extensive engagement. The physicists could articulate reasoning that drew on the community's collective tacit knowledge: reputation assessments, historical pattern-matching against similar claims, implicit standards for what evidence would justify serious engagement. The language models could not reproduce this reasoning. They produced plausible-sounding arguments that lacked the specific social grounding that made the physicists' judgments authoritative.

In the AI Story

Hedcut illustration for Collins and Thorne (2026) — *Collins and Thorne (2026)*

The paper's methodological innovation is its specificity. Rather than asking whether LLMs can 'reason about physics' in general — a question that invites fuzzy answers — Collins and Thorne asked whether LLMs can reproduce a specific form of social reasoning deployed by a specific community for a specific purpose. The question admits sharp answers, and the answers reveal precisely where the mimeomorphic-polimorphic boundary falls. LLMs can discuss the physics with PhD-level literature fluency. They cannot make the community's social judgments with community authority, because those judgments depend on collective tacit knowledge maintained in social discourse rather than textual record.

The paper's central claim — that 'the intelligence we argue is in the humans not the LLMs' — is not a rhetorical dismissal of AI capability. It is a precise sociological finding: when the task requires interactional expertise, the machine succeeds; when the task requires contributory expertise rooted in community participation, the machine fails. The finding generalizes beyond physics to any domain where collective tacit knowledge structures expert practice, which Collins argues is most domains of consequential expertise.

Origin

The paper was posted to arXiv in 2026 as part of Collins's continuing engagement with AI despite his age (83 at the time of publication). Simon Thorne, Collins's co-author, brought complementary expertise in computer science and sociology of technology. The paper's empirical design extends methodologies Collins had developed over decades, applied specifically to the capabilities of frontier language models.

Key Ideas

Specific test. The paper focuses on a specific community reasoning practice, generating sharp empirical findings rather than general impressions.

PhD fluency, not PhD judgment. LLMs reach PhD level on literature search; they do not reach PhD level on original thought or community-authoritative judgment.

The intelligence is in humans. The paper's central claim locates genuine reasoning in the social community, not in the statistical engine that reproduces its textual residue.

Collective tacit knowledge is the barrier. The specific form of reasoning the LLMs could not reproduce depends on knowledge maintained in social practice.

Appears in the Orange Pill Cycle

Harry Collins

Collins and Thorne (2026)

In the AI Story

Origin

Key Ideas

Appears in the Orange Pill Cycle

Related Entries

Further reading