The grounding problem is the epistemological challenge that AI systems generate outputs without experiential connection to the reality those outputs describe. Unlike human inquirers, who form beliefs through observation, experiment, and direct encounter with the world, language models process text—statistical patterns extracted from human linguistic behavior. Training data is not the model's experience; it is a record of others' expressions, themselves several inferential steps removed from the experiences that may or may not have grounded them. In Haack's crossword framework, grounding is the clue—the experiential anchor that constrains belief from outside the web. AI outputs have intersections (coherence with the training corpus) but no clues (observational basis). The result is epistemically weightless coherence: claims that fit together beautifully while corresponding to nothing. Engineering responses like retrieval-augmented generation (RAG) partially address the problem but introduce new vulnerabilities—blurred boundaries between grounded and ungrounded content.
Haack's framework makes visible why the grounding problem is not merely technical. Foundationalism demanded self-justifying basic beliefs and failed because no belief is self-justifying. Coherentism dispensed with grounding entirely and failed because coherence without anchoring is fantasy. Foundherentism preserves the insight that knowledge must connect to experience (the foundationalist's correct intuition) while rejecting the requirement that connection must be through self-evident basic beliefs. Experience plays a causal role—it causes the formation of certain beliefs—without playing the logical role foundationalism assigned. The belief 'there is something red before me' is caused by the experience of seeing red, but the belief is not self-justifying. It is justified by its fit with the total web of evidence, including other perceptual beliefs, background knowledge, and coherence with the rest of the epistemic grid. The experience is a clue—it constrains which answers are acceptable—but it does not determine a unique answer independent of the grid's structure. This subtle relationship between experience and justification is what AI lacks entirely.
AI's relationship to experiential grounding is mediated by training data—the vast text corpus the model was trained on. The training data is not experience in any epistemologically meaningful sense. It is a record of human linguistic outputs: scientific papers, novels, Wikipedia articles, Reddit threads, news reports, legal briefs, blog posts. Some outputs were grounded in the author's genuine observations. Some were expressions of beliefs the author held for bad reasons. Some were fiction, satire, propaganda, error. The corpus is a statistical sample of human expression, and statistical patterns extracted from expressions do not preserve the evidential relationships that made the original claims (when they were genuine) justified. A scientist observes a phenomenon, forms a belief, writes a paper. The paper enters the corpus. The model learns patterns. The model generates claims statistically consistent with those patterns. The user reads the claim and forms a belief. Between the original observation and the user's belief lie at least five inferential steps, each degrading evidential signal. The foundationalist examining this chain searches for bedrock and finds sediment—layers of processed, statistically recombined text whose connection to experiential reality is opaque.
Retrieval-augmented generation (RAG) introduces experiential anchors by connecting the model's generative process to a database of verified documents. Instead of generating purely from trained patterns, the model retrieves relevant sources, then generates in their context. The improvement is real: RAG systems produce fewer fabricated citations, fewer invented statistics, fewer baseless claims. But Haack's framework reveals the residual danger. RAG produces outputs that are grounded at retrieval points and ungrounded elsewhere. The model retrieves three case law decisions (grounded), then generates an inference about how they apply to new facts (ungrounded). The legal analysis that results is anchored at three points and floating at seven. The three grounded points create a halo of reliability that extends, psychologically, to the seven ungrounded ones. The evaluator perceives the analysis as comprehensively grounded when it is only partially so. This is epistemically worse than pure confabulation, because partial grounding disarms the skepticism that ungrounded content requires. The foundherentist evaluator must track which elements are grounded (derived from retrieval) and which are generated (pattern-based extensions), maintaining differential confidence across a seamlessly integrated document. The cognitive demand is severe.
The grounding problem's deepest implication is that engineering alone cannot close the gap. Future models will retrieve more accurately, ground more frequently, distinguish retrieved from generated content more transparently. But the structural fact remains: the model's 'knowledge' is statistical, not observational. It has patterns, not experiences. Improving the patterns tightens the correlation between model outputs and reality without changing the nature of the relationship. The model still does not observe. It infers (statistically) from others' expressions about their observations. This inferential distance is epistemologically ineliminable. The human evaluator must supply what the model cannot: the experiential anchoring, the domain expertise, the independent knowledge that allows checking the clue. The prescription is demanding and asymmetric. The model handles coherence—that is its architectural strength. The human must handle grounding—that is the irreducible human contribution. The evaluator who relies on the model for both is building on sand, however smooth the surface appears.
The grounding problem in AI became urgent in 2022–2023 with ChatGPT's public release, when millions of users encountered outputs that sounded authoritative but contained fabricated citations, invented studies, and fictional 'facts.' The term entered discourse as researchers and practitioners distinguished this failure mode from simple inaccuracy. A wrong date is an error; a citation to a nonexistent case is confabulation. The clinical vocabulary was adopted to capture the structural mechanism: gap-filling by a system optimized for coherence, not truth. By 2024, 'hallucination' had become the industry-standard term, but epistemologists and philosophers of AI argued for 'confabulation' as more precise—reflecting that the problem is narrative, not perceptual.
Susan Haack's foundherentism was developed in the early 1990s, long before large language models existed. But her framework anticipated the epistemic structure of AI-generated content with unusual precision. Her argument against pure coherentism—that a perfectly coherent belief system can be perfectly false—applies directly to AI outputs that cohere internally while floating free of experiential reality. Her insistence that justification requires both anchoring (the clues) and coherence (the intersections) provides the exact diagnostic the AI moment demands. The Susan Haack—On AI simulation applies her epistemology to the confabulation problem, reading foundherentism as the framework that makes visible what fluency conceals: the gap between coherence and grounding, and the evaluator's responsibility to check both.
Confabulation, not hallucination. The model does not misperceive—it fills gaps with statistically likely narrative completions, producing fabrications it cannot distinguish from accurate claims.
Harder to detect than error. Errors conflict with known facts and trigger coherence alarms; confabulations cohere with existing knowledge and evade detection without anchor-checking.
RAG's partial solution. Grounding some outputs in retrieved documents reduces confabulation but creates false security—blurred boundaries between grounded and ungrounded content.
Volume overwhelms verification. The model generates faster than humans can check—producing an asymmetry where most outputs go unverified and confabulations accumulate in the epistemic commons.
Gap-filling is continuous. Every next-token prediction fills a micro-gap; extended generation chains these micro-operations into macro-fabrications, each step plausible, the aggregate false.