You On AI Field Guide · The Confabulation Problem (AI) The You On AI Field Guide Home
Txt Low Med High
CONCEPT

The Confabulation Problem (AI)

AI's production of internally coherent, contextually plausible, confidently delivered fabrications—clinically distinct from hallucination, harder to detect than simple error, requiring anchor-checking the model cannot perform.
The confabulation problem names AI's characteristic epistemic failure: generating claims that are internally consistent, contextually appropriate, and delivered with complete fluency—while being false. Clinically, confabulation (observed in neurological patients with right-hemisphere damage) differs from hallucination: the confabulating patient fills narrative gaps with coherent fabrications and believes them. AI confabulates structurally the same way—next-token prediction fills gaps with statistically likely continuations, regardless of truth. The danger is asymmetric: simple errors conflict with known facts and trigger detection; confabulations cohere with existing knowledge and evade detection. A fabricated Deleuze reference that 'sounds right' passes coherence tests. Only anchor-checking (consulting Deleuze's actual work) catches it. Retrieval-augmented generation (RAG) partially addresses the problem by grounding some outputs in verified documents—but blurs the boundary between grounded and ungrounded content, creating false security.
The Confabulation Problem (AI)
The Confabulation Problem (AI)

In The You On AI Field Guide

The clinical literature on confabulation—Korsakoff syndrome, split-brain patients, right-hemisphere stroke victims—establishes consistent features. Confabulated claims are contextually appropriate (they fit the conversational situation). They are delivered with normal fluency and confidence (the patient does not experience them as fabrications). They resist correction (because the patient believes them). And they fill gaps—when asked why the paralyzed arm is not moving, the patient generates a plausible narrative ('I don't want to,' 'I already did,' 'my arm is tired') that satisfies the coherence requirements of their self-narrative. The fabrication is not strategic deception. It is the brain's automatic gap-filling, operating below conscious awareness. The parallel to AI is precise. Large language models fill gaps continuously—that is what next-token prediction does. Given context, the model extends it in the statistically most likely direction. If the context requests information the model does not possess, the model does not report absence of knowledge (or does so only when trained to). It generates the statistically most likely completion. The completion is often accurate. Sometimes it is confabulated. The surface features are identical.

Confabulation differs from simple error in ways that matter epistemologically. A simple error—'the capital of France is London'—conflicts with known facts and triggers detection through coherence-checking alone. The claim contradicts established knowledge. The evaluator notices the contradiction and rejects the claim. Confabulation produces no such conflict. The confabulated claim extends established knowledge in a direction that feels natural, that does not conflict with the evaluator's existing beliefs. The Deleuze fabrication Segal caught was not a simple error. It was a sophisticated extension of two genuine bodies of thought (Csikszentmihalyi's flow theory and Deleuzian philosophy) that cohered beautifully. Every internal feature signaled quality. Only anchor-checking—consulting Deleuze's actual positions—revealed the fabrication. The confabulation was designed (by architecture, not intention) to pass coherence tests. Detection requires the specific, effortful work of grounding verification that the model's fluency discourages. The evaluator who relies on coherence alone accepts the confabulation, because the confabulation is coherent.

Foundherentism
Foundherentism

Retrieval-augmented generation (RAG) is the dominant engineering response to confabulation. The model retrieves verified documents before generating—anchoring output in sources that have been independently checked. RAG reduces confabulation rates measurably. Studies comparing RAG to unaugmented models show significant improvements across domains. But Haack's framework reveals residual danger. RAG produces output that is grounded at some points (where retrieval occurs) and ungrounded at others (where the model fills gaps between retrieved documents with pattern-based completions). The grounded and ungrounded elements are woven together seamlessly. The seam is where the evaluator could distinguish reliable from unreliable—and seamlessness erases the seam. The result is partial grounding that creates false security. The evaluator knows some content is anchored and lowers epistemic guard. Confabulated extensions enter through the opening. Haack's independent security dimension applies: the quality of RAG's grounding depends on the quality of retrieved documents, which depends on database curation. A RAG system grounded in peer-reviewed literature is more reliable than one grounded in blog posts. But reliability is invisible to the user unless the system transparently distinguishes retrieved from generated content—which current systems do not.

The confabulation problem is not a temporary bug awaiting an engineering fix. It is an architectural feature of systems that generate output through statistical pattern completion rather than evidential reasoning. Future models will confabulate less frequently as training improves and grounding mechanisms become more sophisticated. But the structural gap between the model's ability to produce coherent claims and its inability to verify those claims correspond to reality is epistemological, not engineering. Closing the gap requires human epistemic labor: checking anchors, tracing claims to sources, maintaining the distinction between 'sounds right' and 'is right.' The labor does not scale. The model generates at computational speed. The human checks at cognitive speed, constrained by domain knowledge, available time, and tolerance for tedious verification. The asymmetry is structural. Most AI-generated claims go unverified. The ones that get through may be true or confabulated—without checking, the evaluator cannot tell. Haack's prescription is demanding: treat AI output as proposed crossword entries requiring clue-checking before acceptance. The presence of grounded claims strengthens the grid but does not justify accepting ungrounded claims by proximity.

Origin

The term 'confabulation' entered AI discourse around 2022–2023 as an alternative to 'hallucination,' which implies perceptual failure. Confabulation—borrowed from clinical neuropsychology—more accurately describes the mechanism: narrative gap-filling by a system that produces coherent fabrications without experiencing them as fabrications. The clinical literature traces to Sergei Korsakoff's 1880s descriptions of memory confabulation in alcoholic patients, extended by Morris Moscovitch and others across the twentieth century. The application to AI recognizes that language models fill gaps the same way damaged brains do—automatically, coherently, without epistemic access to whether the filling corresponds to reality. The Susan Haack—On AI simulation applies Haack's foundherentist framework to diagnose why confabulation is harder to detect than error: confabulations satisfy coherence (intersections) while violating grounding (clues), and human evaluators are cognitively calibrated to trust coherence as evidence of truth.

Haack did not develop foundherentism to address AI—Evidence and Inquiry (1993) predates the current moment by three decades. But her framework's application is not retrofitting. It is recognition that the epistemic structure she diagnosed—the necessity of both anchoring and coherence, the insufficiency of either alone—applies with intensified urgency to outputs produced by pure coherence engines. The model's confabulations are, in Haack's terms, beliefs that pass the coherentist test (internal consistency, mutual support) while failing the foundationalist test (experiential grounding). The evaluator applying only one test accepts fabrications. The evaluator applying both—checking clues and intersections—catches them. The framework makes visible what the fluency conceals.

Key Ideas

Coherentism and AI Temptations
Coherentism and AI Temptations

Gap-filling is architectural. Next-token prediction extends context in statistically likely directions—filling informational gaps with pattern-based completions that may or may not correspond to reality.

Confabulation vs. error. Errors conflict with known facts and trigger coherence-checking detection. Confabulations cohere with existing knowledge and evade detection without anchor-checking.

RAG's false security. Partial grounding creates the illusion of comprehensive grounding—retrieved documents anchor some claims, ungrounded extensions ride the coattails of reliability.

Detection requires domain knowledge. Catching confabulations demands independent expertise to recognize the gap between what the model claims and what the evidence supports—a requirement that does not scale.

Unverified confabulations degrade the commons. Each undetected fabrication entering the shared informational environment becomes part of the web against which subsequent claims are checked—compounding error across iterations.

In The You On AI Book

This concept surfaces across 2 chapters of You On AI. Each passage below links back into the book at the exact page.
Chapter 4 Dylan's Like a Rolling Stone Page 3 · Inference and Temperature
…anchored on "hallucinations are a confidence problem"
At their core, hallucinations are a confidence problem. The model generates responses based on the closest probabilistic match in its training distribution, regardless of temperature. Nothing in its architecture forces it to distinguish…
The genius is the person whose particular configuration of inputs, processed through a particular biographical architecture, produces a synthesis that no other configuration could have produced.
Turn it up, and the outputs get stranger, more surprising, occasionally brilliant, occasionally incoherent. Like the machine getting stoned.
Read this passage in the book →
Chapter 7 Who Is Writing This Book? Page 4 · The Deleuze Failure
…anchored on "confident wrongness dressed in good prose"
Claude's most dangerous failure mode is exactly this: confident wrongness dressed in good prose. The smoother the output, the harder it is to catch the seam where the idea breaks. Han would appreciate the irony.
Claude's most dangerous failure mode is exactly this: confident wrongness dressed in good prose.
You stop doing the hard, ugly, private work of figuring out what you actually believe, because the tool will generate something plausible regardless of whether you've earned it.
Read this passage in the book →

Further Reading

  1. Morris Moscovitch, 'Confabulation and the Frontal Systems,' in Varieties of Memory and Consciousness, ed. H.L. Roediger III and F.I.M. Craik (Lawrence Erlbaum, 1989)
  2. Armin Schnider, The Confabulating Mind (Oxford, 2008)
  3. Susan Haack, Evidence and Inquiry (Blackwell, 1993), chapters 4–5
  4. Patrick Lewis et al., 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,' NeurIPS (2020)
  5. Anthropic, 'Constitutional AI: Harmlessness from AI Feedback' (2022)
Explore more
Browse the full You On AI Field Guide — over 8,500 entries
← Home 0%
CONCEPT Book →