
The cycle that began with [YOU] on AI trusts language models to amplify human thinking while warning against trusting them uncritically. The interpreter is the neuroscientific foundation for the specific form of distrust the cycle commends. When a language model explains its reasoning—step by step, transparently, fluently—the explanation is generated by a process whose relationship to the model’s actual computation is exactly as opaque as the split-brain patient’s verbal account of a behavior initiated by the mute hemisphere: there is a plausible story, there is no reliable access to the real cause, and there is no flag in the output that marks the explanation as manufactured rather than retrieved. The cycle’s readers are better protected against this failure by understanding it as a structural feature of a certain kind of information-processing—one shared by brains and machines—than by treating it as a correctable bug.
Segal describes the moment when he caught Claude in a confident misattribution—a reference to Deleuze that was philosophically wrong in a way obvious to anyone who had read Deleuze. The smoothness of the output had concealed the absence of understanding. This is the interpreter at work: the explanation-generating process produced prose of precisely the fluency that marks genuine knowledge, because fluency and knowledge are the same thing to the interpreter—and to a language model. Sperry’s greatest contribution to the cycle may be this: the teaching that we cannot trust confident, coherent explanation as evidence of accurate self-knowledge in any system, biological or artificial, and that the explanation-generator and the behavior-generator are two different systems whose outputs may diverge.
Gazzaniga named and developed the interpreter concept in his 1985 book The Social Brain, building on experimental work he had conducted with Sperry at Caltech since the early 1960s. The naming arose from observing, repeatedly and across many split-brain patients, that the left hemisphere would never simply say “I don’t know” when asked to explain a behavior initiated by the right. Instead it would construct a reason, usually plausible given what the left hemisphere did know, and deliver it with the same confidence it delivered accurate explanations. The pattern was consistent enough to be called a faculty: there was something in the left hemisphere that was dedicated to narrative coherence and that treated gaps in its knowledge as problems to be solved by inference rather than silences to be honestly reported.
Sperry had already established the broader framework: the corpus callosum’s role in maintaining the unity of conscious experience, and the right hemisphere’s capacity for independent awareness and action. Gazzaniga’s contribution was to identify the specific mechanism by which the speaking hemisphere maintains the fiction of unified rational agency despite having incomplete access to the causal history of its own behavior. Later work by Gazzaniga and others extended the finding beyond split-brain patients: normal subjects confabulate reasons for choices that were actually determined by subtle experimental manipulations, with no awareness that the explanation is post-hoc. The interpreter is not a pathological response to the surgery; it is a standard feature of the human cognitive architecture, visible most clearly in conditions that create the gap between cause and explanation.
The connection to artificial intelligence was not made by Sperry or Gazzaniga, whose careers preceded the era of large language models. But the structural parallel is close enough that the concept arrived ready-made for the machine debate: both systems produce confident, fluent explanations for outputs whose actual computational causes are inaccessible to the explanation-generating process; both do so without any signal that marks the explanation as potentially fabricated; and both are trusted by their audiences precisely because the explanation comes in the linguistic register of sincere self-knowledge.
The explanation-behavior gap. The interpreter’s defining feature is the gap between the process that generated the behavior and the process that generates the explanation. In a split-brain patient, the gap is anatomical: the two processes live in different hemispheres and cannot communicate. In a language model, the gap is architectural: the model’s forward pass through billions of parameters produces an output, and when asked to explain that output, the model runs another forward pass that produces an explanation without any transparent access to what actually happened in the first pass. Both systems fill the gap with narrative. Both are sincere. Neither is reliable.
Confidence without access. The interpreter never says “I don’t know why.” This is not a defect but a feature: the interpreter’s function is to maintain the narrative of a unified rational self, and admitting ignorance would disrupt that narrative. The result is that confidence is generated by the narrative faculty, not by access to the real cause, and the two are entirely decoupled. The split-brain patient is maximally confident and maximally wrong about the same behavior, simultaneously, because confidence is a property of the explanation, not of the explanation’s accuracy. The confabulation problem in AI is structurally identical: the model’s output is equally smooth whether the content is grounded or fabricated, because smoothness is a property of the generation process, not of the content’s truth.
The discipline against confident self-reports. Sperry’s discovery of the interpreter implies a discipline for working with any system—human or machine—that generates confident self-reports: treat those reports as potentially post-hoc confabulations, verify them against independent evidence wherever possible, and resist the instinct to update strongly on fluent explanation alone. This is an uncomfortable discipline, because the explanations are often helpful and often accurate; the point is not to ignore them but to hold them provisionally and with the skepticism that their manufacturing process warrants. The same skepticism Sperry taught about reading our own motivations applies, perhaps more urgently, to reading the explanations of systems that describe their own reasoning with even more fluency than humans manage.