You On AI Field Guide · The Rule-Following Paradox The You On AI Field Guide Home
TxtLowMedHigh
CONCEPT

The Rule-Following Paradox

Kripke’s reconstruction of the most radical skeptical problem in modern philosophy: for any finite sequence of behavior, infinitely many rules are consistent with it, so no fact about an individual can establish which rule they meant—a theorem that proves, in advance, why evaluating whether a machine has learned a rule from any finite test is structurally impossible.
The rule-following paradox, developed by Saul Kripke in Wittgenstein on Rules and Private Language (1982) and now sometimes called “Kripkenstein,” begins with an arithmetic problem and ends at the foundations of meaning. The setup is deceptively small: you have always meant addition by “plus.” Prove it. Not that you will keep adding correctly, but that some fact in your history or your mind makes addition rather than a deviant function the thing you actually meant. Kripke shows that no such fact can be produced. The machinery is the function “quus,” which agrees with addition on all inputs you have ever tried and returns five for anything larger—so your entire track record is consistent with both. The mental rule you appeal to was also finite; the disposition you appeal to is also consistent with the deviant function and says what you will do, not what you ought to do; every candidate dissolves. The conclusion is that there is no inner fact that constitutes meaning one rule rather than a deviant variant. Kripke offers a “skeptical solution”—not a fact that settles the matter but a change of subject from the individual to the community: there is only the public practice of speakers who correct, agree, and go on the same way, and meaning is the license granted by belonging to that corrective practice. For large language models, this is the deepest available instrument: the model agrees with us on the training set; which rule did it learn? The paradox proves, before any neural network existed, why the question is so hard—and why behavioral evidence cannot, even in principle, settle it.
The Rule-Following Paradox
The Rule-Following Paradox

In the [YOU] on AI Field Guide

The cycle asks where the genuine limits of current AI systems lie—not the engineering limits that more compute might close, but the structural limits that no amount of data or scale can eliminate. The rule-following paradox is a proof of one such structural limit. For any finite training set, infinitely many functions agree with the data and diverge off it; the function the model implements is whichever its weights happen to encode, and there is no guarantee it is the one we intended. This is not a philosophical embroidery on an engineering problem. It is the same problem, seen from the foundations. The no-free-lunch theorems in machine learning prove the same result mathematically: without a built-in inductive bias, no learner is better than chance across all possible target functions. Kripke’s paradox is the philosophical name for what engineers call the generalization problem.

Rule-Following
Rule-Following

The practical face of this is out-of-distribution failure: the vision model that learned to detect cows by detecting grass; the language model that seems to have learned a logical rule but turns out to have learned a textual correlation that agrees with the rule on common inputs and parts company on rare ones. Every such surprise is a quus-variant manifesting. The model was following a rule we never specified and could not have detected from its in-distribution behavior—because its behavior, like the child’s arithmetic, was finite and the rules consistent with it were infinite. The community of practice—in the form of human feedback, red-teaming, and continuous monitoring—is the only apparatus Kripke’s skeptical solution recommends: not the discovery of a meaning-fact, but the extension of a corrective practice into the distribution where the model operates.

Saul Kripke

Origin

Kripke developed the paradox in lectures in the late 1970s, publishing it in 1982 as a reading of the later Wittgenstein he explicitly attributed to himself rather than to the historical philosopher—calling it “Kripkenstein” to mark that it was his own reconstruction of the strongest skeptical argument the text could sustain. The central device, the function quus, is his invention; the philosophical context is Wittgenstein’s remarks on rule-following in Philosophical Investigations, which Kripke read as posing a genuine skeptical problem rather than dissolving it by changing the subject.

Symbol Grounding Problem
Symbol Grounding Problem

The reception was immediate and controversial. Philosophers of language, mathematics, and mind debated whether the paradox was genuine or a confusion; whether the skeptical solution was satisfying or merely avoided the hard question; whether Kripke had correctly read Wittgenstein at all. These debates continued for decades. The paradox’s influence on philosophy of language is unambiguous; its import for AI became visible only after the training of language models on internet-scale data made the question “which function, exactly, did it learn?” an engineering problem with real consequences.

Large Language Models
Large Language Models

Key Ideas

Quus and the underdetermination of rules. The function quus (or “quaddition”) is defined to agree with addition on all inputs below a threshold and return five for any inputs above it. Since all past behavior is consistent with both functions, no finite track record can distinguish them. The argument generalizes: for any intended rule and any finite set of past applications, there is a deviant rule consistent with that set. This is a structural theorem about the relationship between evidence and meaning, not a practical limit on our measurement instruments.

Emergent Capabilities
Emergent Capabilities

The skeptical solution: meaning as community practice. Rather than finding the missing inner fact, Kripke’s solution changes the subject. Meaning is not constituted by an inner state in the individual but by the public practice of a community that corrects deviation and certifies what counts as going on the same way. A system that has been trained and tuned by human feedback is being shaped to conform to a human corrective practice—and whatever normativity it possesses is borrowed from that practice rather than generated internally. The machine does not follow the rule; at best, it rides on the practice of humans who do.

Information and Rules
Information and Rules

Out-of-distribution as the home of the paradox. Inside the distribution of training data, the quus-variant and the intended rule agree, and the system’s conformity is reinforced at every step. Step outside—a novel domain, an adversarial prompt, a situation no human text anticipated—and the community of practice that kept the model in step has fallen silent. The quus-variant the model actually learned is then free to diverge from the rule we imagined it had. This is why out-of-distribution behavior is not a peripheral safety concern but the exact locus of the rule-following paradox applied to language models: it is the region where the difference between following a rule and conforming to one stops being abstract and starts having consequences.

Philosophical Investigations
Philosophical Investigations

Debates & Critiques

The main philosophical objection is that the paradox is not genuine—that Wittgenstein’s actual point was not that there is no fact about meaning but that meaning-facts are not hidden inner states accessible by introspection, and that ordinary practices of agreement and correction are sufficient to ground meaning without requiring a community-level reduction that looks as skeptical as the problem it was meant to dissolve. Philosophers including John McDowell and Paul Boghossian have pressed this objection in different ways. The machine-learning reception adds a different twist: the no-free-lunch theorems prove the underdetermination result formally, but practitioners note that inductive biases—architecture, regularization, the structure of the loss function—do select among consistent hypotheses, and that the question is not whether selection happens but whether the selected hypothesis generalizes in the ways we intended. The interpretability research program takes Kripke’s challenge seriously by opening networks to examine their internal computations—trying to read the rule from the mechanism rather than the behavior. Kripke’s own argument suggests how hard this road is: even with the mechanism in view, one faces the same question about the mechanism itself, since any finite activation pattern is also consistent with deviant continuations. The honest conclusion is that the paradox raises the bar for confidence in a model’s generalization, without specifying how high the bar needs to be for any given deployment. That calibration question is the one that remains open.

Further Reading

  1. Saul Kripke, Wittgenstein on Rules and Private Language (Harvard University Press, 1982) — the source
  2. Paul Boghossian, “The Rule-Following Considerations,” Mind 98 (1989): 507–549 — the clearest critical survey of the debate
  3. John McDowell, “Wittgenstein on Following a Rule,” Synthese 58 (1984): 325–363
  4. David Wolpert & William Macready, “No Free Lunch Theorems for Optimization,” IEEE Transactions on Evolutionary Computation 1 (1997): 67–82 — the machine-learning parallel
Explore more
Browse the full You On AI Field Guide — over 8,500 entries
← Home0%
CONCEPTBook →