CONCEPT

The Principle of Charity (Davidson)

Donald Davidson's name for the unavoidable interpretive assumption that any speaker we understand must be assumed largely rational and largely right about the world—and, by extension, the precise mechanism by which fluent AI outputs exploit our interpretive reflexes to project minds where none may exist.

The principle of charity is Donald Davidson's answer to the question of how interpretation ever gets started. To assign meanings to a speaker's words, an interpreter must already know the speaker's beliefs; to assign beliefs, the interpreter must already know the meanings. There is no neutral ground to stand on while working out both simultaneously. Davidson's escape is to hold one variable steady by stipulation: the interpreter assumes, before any evidence, that the speaker's beliefs are largely true and largely coherent—that the speaker reasons in recognizable ways and gets the world mostly right. Only against this background assumption can behavior be used to triangulate what the words mean. The principle has two strands: logical coherence (the speaker's beliefs hang together by the rules of inference) and empirical correspondence (the speaker believes what a well-placed observer would believe about the shared environment). Both are preconditions of interpretation, not conclusions reached after it. With a machine, the principle becomes a trap rather than a tool: the same reflexive charity that is calibrated for human speakers fires when confronted with any sufficiently fluent output, and fluency is not evidence that the conditions warranting charity are satisfied. This is the mechanism behind the fluency-authority decorrelation: the surface invites the assumption, the assumption does most of the interpreting, and the felt presence of a mind on the far side of the conversation is largely a projection from this side. What Davidson calls confabulation—what AI discourse calls hallucination—is the diagnostic: a genuine believer's errors are constrained by the web of surrounding beliefs and by accountability to a shared world; massive, unanchored error is not a possible state for an interpretable mind. The model can generate it because nothing in it answers to the constraint that makes charity appropriate.

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI frames the AI transition as a moment when the relationship between fluency and authority was structurally decoupled. Davidson's principle of charity is the philosophical account of why this decoupling is so dangerous: our interpretive machinery was built for a world in which the correlation held, and the machinery fires in response to surface features—grammatical fluency, apparent coherence, confident assertion—without any independent check on whether the conditions that made those features reliable indicators of competence are actually present.

The practical consequence runs through every domain where AI is deployed as an expert system. When a model produces a confident legal opinion, medical assessment, or historical claim, the principle of charity fires: we read the output as the expression of an underlying competence, a web of beliefs that mostly tracks the truth, a mind that has done the intellectual work behind the words. Davidson's analysis says this reading is our construction, not our perception—that the interpreter is supplying the rationality, the coherence, and the world-tracking that the output appears to embody. The warning is not to withhold all interpretation, which is impossible. It is to hold steadily in view that we are being charitable, and to ask, in each case, whether the charity is earned.

Origin

Davidson introduced the principle across a series of essays in the 1970s, most fully developed in "Radical Interpretation" (1973) and "Thought and Talk" (1975). The term "principle of charity" he borrowed from the philosopher Neil Wilson, who had used it in a narrower and less systematic way; Davidson transformed it into the centerpiece of a complete account of how meaning and belief are simultaneously constituted by the activity of interpretation. The principle has a complex relationship to Quine's indeterminacy of translation: where Quine held that multiple translation manuals could fit the behavioral evidence without there being a fact that picks one as correct, Davidson accepted the indeterminacy but argued that the principle of charity constrains the space of acceptable manuals far more tightly than Quine acknowledged, because both coherence and correspondence must be maximized simultaneously.

The concept has been developed and contested in the philosophical literature for fifty years. Neil Davidson (no relation) and others have argued that the principle underdetermines interpretation in ways Davidson did not acknowledge; others have argued that it is not a single principle but a family of related assumptions that pull in different directions. The AI application is largely new: the philosophical literature on charity developed without the case of a system optimized to produce outputs that look as though they issue from charity-worthy minds, and that case changes the practical stakes of the principle considerably.

Key Ideas

Charity as transcendental condition. The principle is not a courtesy or a heuristic but the precondition of interpretation itself. You cannot understand anyone you do not first assume to be largely rational and largely right. This means charity cannot be switched off; it can only be held up to the light, recognized as an assumption, and interrogated for whether its warrant is present. With machines, the interrogation is the work.

The fluency trigger. The principle of charity fires in response to surface features that, for human speakers, reliably indicate the presence of a rational, world-tracking mind: grammatical fluency, apparent coherence, confident assertion. A language model is a system optimized to produce exactly these surface features. The principle therefore fires in response to machine output with the same automaticity it fires in response to human output—and with less warrant, because the surface was engineered to produce the trigger rather than arising naturally from an underlying mind.

Hallucination as the diagnostic. A genuine believer's errors are constrained by the surrounding web of belief and by accountability to a shared world; they cannot be too many or too wild without the attribution of belief collapsing. This is the force of charity: massive, unanchored error is not a possible state for an interpretable mind, because the errors would reverberate through the web and destabilize the content of every connected belief. A large language model can generate locally fluent and globally unmoored falsehoods because nothing in it answers to this constraint. The hallucination is not a deviation from the model's nature; it is a window onto the fact that the conditions licensing charity were never in place.

Interpretation as projection. Davidson's framework reveals that when we read a machine's output as meaning something, we are doing far more interpretive work than we realize: supplying coherence, projecting intentions and beliefs, completing the triangle whose far corner may be empty. The meaning we experience as simply present in the output is, to a degree we systematically underestimate, a meaning we have authored in the act of reading. The machine provides the occasion and the tokens; we provide the mind. This is not a reason to stop interpreting these systems—we have no other access to them—but it is a reason to know that we are doing it.

Debates & Critiques

The central debate about the principle of charity in the AI context is whether the principle itself is the source of the problem or merely the name for an inevitable feature of interpretation. Those who see it as the source argue that we should develop new interpretive practices for AI that are explicitly less charitable—that treat model output as output rather than as assertion, and that require independent verification before extending the credence that charity normally confers. Those who see charity as ineliminable argue that we cannot interpret without it—that to receive any utterance as meaningful at all is already to extend some form of charity—and that the remedy is not less charity but more awareness of what we are doing when we extend it. A related debate concerns whether the principle applies differently to systems that have been fine-tuned through reinforcement from human feedback: if a model has been shaped by millions of correction signals from human evaluators who adjusted its outputs toward greater accuracy and coherence, does the resulting disposition begin to earn the warrant that charity presupposes? Emergent capabilities research suggests that at sufficient scale, model behavior exhibits coherence properties that the early charitable readings of smaller models lacked, complicating the clean verdict that charity is always unwarranted. Davidson himself would insist that the question is not about scale but about the presence or absence of the constitutive conditions: rational holism, world-anchored content, triangulated reference. Whether these conditions can be approximated through optimization is the open question his framework locates.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Debates & Critiques

Related Entries

Further Reading