CONCEPT

The Rule-Following Paradox

Kripke’s reconstruction of the most radical skeptical problem in modern philosophy: for any finite sequence of behavior, infinitely many rules are consistent with it, so no fact about an individual can establish which rule they meant—a theorem that proves, in advance, why evaluating whether a machine has learned a rule from any finite test is structurally impossible.

The rule-following paradox, developed by Saul Kripke in Wittgenstein on Rules and Private Language (1982) and now sometimes called “Kripkenstein,” begins with an arithmetic problem and ends at the foundations of meaning. The setup is deceptively small: you have always meant addition by “plus.” Prove it. Not that you will keep adding correctly, but that some fact in your history or your mind makes addition rather than a deviant function the thing you actually meant. Kripke shows that no such fact can be produced. The machinery is the function “quus,” which agrees with addition on all inputs you have ever tried and returns five for anything larger—so your entire track record is consistent with both. The mental rule you appeal to was also finite; the disposition you appeal to is also consistent with the deviant function and says what you will do, not what you ought to do; every candidate dissolves. The conclusion is that there is no inner fact that constitutes meaning one rule rather than a deviant variant. Kripke offers a “skeptical solution”—not a fact that settles the matter but a change of subject from the individual to the community: there is only the public practice of speakers who correct, agree, and go on the same way, and meaning is the license granted by belonging to that corrective practice. For large language models, this is the deepest available instrument: the model agrees with us on the training set; which rule did it learn? The paradox proves, before any neural network existed, why the question is so hard—and why behavioral evidence cannot, even in principle, settle it.

In the [YOU] on AI Field Guide

The cycle asks where the genuine limits of current AI systems lie—not the engineering limits that more compute might close, but the structural limits that no amount of data or scale can eliminate. The rule-following paradox is a proof of one such structural limit. For any finite training set, infinitely many functions agree with the data and diverge off it; the function the model implements is whichever its weights happen to encode, and there is no guarantee it is the one we intended. This is not a philosophical embroidery on an engineering problem. It is the same problem, seen from the foundations. The no-free-lunch theorems in machine learning prove the same result mathematically: without a built-in inductive bias, no learner is better than chance across all possible target functions. Kripke’s paradox is the philosophical name for what engineers call the generalization problem.

The practical face of this is out-of-distribution failure: the vision model that learned to detect cows by detecting grass; the language model that seems to have learned a logical rule but turns out to have learned a textual correlation that agrees with the rule on common inputs and parts company on rare ones. Every such surprise is a quus-variant manifesting. The model was following a rule we never specified and could not have detected from its in-distribution behavior—because its behavior, like the child’s arithmetic, was finite and the rules consistent with it were infinite. The community of practice—in the form of human feedback, red-teaming, and continuous monitoring—is the only apparatus Kripke’s skeptical solution recommends: not the discovery of a meaning-fact, but the extension of a corrective practice into the distribution where the model operates.

Origin

Kripke developed the paradox in lectures in the late 1970s, publishing it in 1982 as a reading of the later Wittgenstein he explicitly attributed to himself rather than to the historical philosopher—calling it “Kripkenstein” to mark that it was his own reconstruction of the strongest skeptical argument the text could sustain. The central device, the function quus, is his invention; the philosophical context is Wittgenstein’s remarks on rule-following in Philosophical Investigations, which Kripke read as posing a genuine skeptical problem rather than dissolving it by changing the subject.

The reception was immediate and controversial. Philosophers of language, mathematics, and mind debated whether the paradox was genuine or a confusion; whether the skeptical solution was satisfying or merely avoided the hard question; whether Kripke had correctly read Wittgenstein at all. These debates continued for decades. The paradox’s influence on philosophy of language is unambiguous; its import for AI became visible only after the training of language models on internet-scale data made the question “which function, exactly, did it learn?” an engineering problem with real consequences.

Key Ideas

Quus and the underdetermination of rules. The function quus (or “quaddition”) is defined to agree with addition on all inputs below a threshold and return five for any inputs above it. Since all past behavior is consistent with both functions, no finite track record can distinguish them. The argument generalizes: for any intended rule and any finite set of past applications, there is a deviant rule consistent with that set. This is a structural theorem about the relationship between evidence and meaning, not a practical limit on our measurement instruments.

The skeptical solution: meaning as community practice. Rather than finding the missing inner fact, Kripke’s solution changes the subject. Meaning is not constituted by an inner state in the individual but by the public practice of a community that corrects deviation and certifies what counts as going on the same way. A system that has been trained and tuned by human feedback is being shaped to conform to a human corrective practice—and whatever normativity it possesses is borrowed from that practice rather than generated internally. The machine does not follow the rule; at best, it rides on the practice of humans who do.

Out-of-distribution as the home of the paradox. Inside the distribution of training data, the quus-variant and the intended rule agree, and the system’s conformity is reinforced at every step. Step outside—a novel domain, an adversarial prompt, a situation no human text anticipated—and the community of practice that kept the model in step has fallen silent. The quus-variant the model actually learned is then free to diverge from the rule we imagined it had. This is why out-of-distribution behavior is not a peripheral safety concern but the exact locus of the rule-following paradox applied to language models: it is the region where the difference between following a rule and conforming to one stops being abstract and starts having consequences.

Explore more

Browse the full You On AI Field Guide — over 8,500 entries