
The cycle that began with [YOU] on AI trusts language models to amplify human thinking while warning against trusting them uncritically. The interpreter is the neuroscientific foundation for the specific form of distrust the cycle commends. When a language model explains its reasoning—step by step, transparently, fluently—the explanation is generated by a process whose relationship to the model’s actual computation is exactly as opaque as the split-brain patient’s verbal account of a behavior initiated by the mute hemisphere: there is a plausible story, there is no reliable access to the real cause, and there is no flag in the output that marks the explanation as manufactured rather than retrieved. The cycle’s readers are better protected against this failure by understanding it as a structural feature of a certain kind of information-processing—one shared by brains and machines—than by treating it as a correctable bug.
Segal describes the moment when he caught Claude in a confident misattribution—a reference to Deleuze that was philosophically wrong in a way obvious to anyone who had read Deleuze. The smoothness of the output had concealed the absence of understanding. This is the interpreter at work: the explanation-generating process produced prose of precisely the fluency that marks genuine knowledge, because fluency and knowledge are the same thing to the interpreter—and to a language model. Sperry’s greatest contribution to the cycle may be this: the teaching that we cannot trust confident, coherent explanation as evidence of accurate self-knowledge in any system, biological or artificial, and that the explanation-generator and the behavior-generator are two different systems whose outputs may diverge.
Gazzaniga named and developed the interpreter concept in his 1985 book The Social Brain, building on experimental work he had conducted with Sperry at Caltech since the early 1960s. The naming arose from observing, repeatedly and across many split-brain patients, that the left hemisphere would never simply say “I don’t know” when asked to explain a behavior initiated by the right. Instead it would construct a reason, usually plausible given what the left hemisphere did know, and deliver it with the same confidence it delivered accurate explanations. The pattern was consistent enough to be called a faculty: there was something in the left hemisphere that was dedicated to narrative coherence and that treated gaps in its knowledge as problems to be solved by inference rather than silences to be honestly reported.

Sperry had already established the broader framework: the corpus callosum’s role in maintaining the unity of conscious experience, and the right hemisphere’s capacity for independent awareness and action. Gazzaniga’s contribution was to identify the specific mechanism by which the speaking hemisphere maintains the fiction of unified rational agency despite having incomplete access to the causal history of its own behavior. Later work by Gazzaniga and others extended the finding beyond split-brain patients: normal subjects confabulate reasons for choices that were actually determined by subtle experimental manipulations, with no awareness that the explanation is post-hoc. The interpreter is not a pathological response to the surgery; it is a standard feature of the human cognitive architecture, visible most clearly in conditions that create the gap between cause and explanation.
The connection to artificial intelligence was not made by Sperry or Gazzaniga, whose careers preceded the era of large language models. But the structural parallel is close enough that the concept arrived ready-made for the machine debate: both systems produce confident, fluent explanations for outputs whose actual computational causes are inaccessible to the explanation-generating process; both do so without any signal that marks the explanation as potentially fabricated; and both are trusted by their audiences precisely because the explanation comes in the linguistic register of sincere self-knowledge.
The explanation-behavior gap. The interpreter’s defining feature is the gap between the process that generated the behavior and the process that generates the explanation. In a split-brain patient, the gap is anatomical: the two processes live in different hemispheres and cannot communicate. In a language model, the gap is architectural: the model’s forward pass through billions of parameters produces an output, and when asked to explain that output, the model runs another forward pass that produces an explanation without any transparent access to what actually happened in the first pass. Both systems fill the gap with narrative. Both are sincere. Neither is reliable.
Confidence without access. The interpreter never says “I don’t know why.” This is not a defect but a feature: the interpreter’s function is to maintain the narrative of a unified rational self, and admitting ignorance would disrupt that narrative. The result is that confidence is generated by the narrative faculty, not by access to the real cause, and the two are entirely decoupled. The split-brain patient is maximally confident and maximally wrong about the same behavior, simultaneously, because confidence is a property of the explanation, not of the explanation’s accuracy. The confabulation problem in AI is structurally identical: the model’s output is equally smooth whether the content is grounded or fabricated, because smoothness is a property of the generation process, not of the content’s truth.
The discipline against confident self-reports. Sperry’s discovery of the interpreter implies a discipline for working with any system—human or machine—that generates confident self-reports: treat those reports as potentially post-hoc confabulations, verify them against independent evidence wherever possible, and resist the instinct to update strongly on fluent explanation alone. This is an uncomfortable discipline, because the explanations are often helpful and often accurate; the point is not to ignore them but to hold them provisionally and with the skepticism that their manufacturing process warrants. The same skepticism Sperry taught about reading our own motivations applies, perhaps more urgently, to reading the explanations of systems that describe their own reasoning with even more fluency than humans manage.
The central debate concerns how far the interpreter finding generalizes. Gazzaniga and colleagues have argued that the interpreter is a universal feature of human cognition, not a split-brain artifact: normal subjects confabulate reasons for choices determined by experimental manipulations, and the interpreter’s products feel, from the inside, indistinguishable from genuine reasons. Critics have questioned whether the experimental paradigms fully exclude genuine self-knowledge, arguing that the evidence for confabulation in normal subjects is weaker than the split-brain demonstrations and that humans may have considerably more introspective access to their own mental processes than the interpreter hypothesis implies. A second debate concerns the relevance to AI. Some AI researchers argue that the structural parallel to language model confabulation is superficial: the split-brain patient confabulates because an anatomical division prevents access to real causes, while a language model confabulates because it lacks causal knowledge of the world, and these are different problems requiring different solutions. The deeper disagreement is whether the interpreter is a general solution to the problem of maintaining a coherent self-narrative—one that any system with similar goals would independently develop—or a specifically biological response to a specifically biological problem. If the former, any system that presents itself as a coherent agent and lacks transparent access to its own computation will develop interpreter-like behavior; if the latter, the parallel is instructive but not predictive. Sperry’s framework suggests the former: the interpreter exists because narrative coherence is adaptive and because the gap between cause and explanation is a structural feature of complex information-processing systems, not a peculiarity of the corpus callosum surgery.