Following a Rule, or Only Conforming

Page 1 · Following a Rule, or

**EDO SEGAL:** Ludwig, Jerry just staked his life on the difference between a creature that *follows* a rule and a system that merely *conforms* to one. You've said that distinction is yours and he's holding it backwards. Take us into it. And do the thing you do — make it concrete. The way you'd teach it.

**WITTGENSTEIN:** Teach someone "add 2." You give examples: 2, 4, 6, 8. She continues: 10, 12, 14. All is well. Then past a thousand she writes 1004, 1008, 1012. You say she has gone wrong. She says, sincerely, that she is doing exactly what she always did — going on in the same way. And here is the vertigo: *nothing in the examples you gave rules out her reading*. Her finite training is equally consistent with "add 2" and with "add 2 up to 1000, then add 4." Every finite stretch of behavior fits infinitely many rules. So what *made* it the case that the rule was "add 2"? Not the examples — they underdetermine it. Not a mental image — the image needs interpreting too, and any interpretation hangs in the air with what it interprets. Not a private inner grasp — for what would make *that* the right grasp rather than a deviant one?

The answer — and this is what Jerry must hear — is *not* an inner principle she consults. There is no such thing. The answer is that she goes on as the rest of us go on, that her continuation *agrees with the practice* of a community that shares a form of life, and that this agreement is what "following the rule" *consists in*. Following a rule is not a private act of insight. It is participation in a custom. To follow a rule is a practice. And to *think* one is following a rule is not to follow a rule — because then "following the rule" and "thinking you are" would be the same, and there would be no rule, only the impression of one.

· · ·

Page 2 · Following a Rule, or

**FODOR:** And here's why I'm not afraid of this, even though I grant it's deep. You've just argued that *nothing in the head* fixes which rule a finite creature follows — that it takes the community. Fine. But that *helps* me, Ludwig, it doesn't hurt me. Because now run it on the machine. The machine is trained on a finite, vast set of examples and deployed on inputs it never saw. The whole question of whether it'll "do what we trained it to do" is the question of whether it latched onto *our* rule or some other rule equally consistent with the training — a rule that agrees on every example seen and diverges, catastrophically, past the thousand we never tested. That's not a philosophy puzzle. That is the literal structure of every alignment failure, every specification-gaming exploit, every model that aced evaluation and went insane in deployment. Your "add 2" paradox is the [alignment problem](https://www.youonai.ai/fieldguide/med/the_pattern), stated in 1953, before there was a machine to misalign. We agree completely. So where's my cost?

**WITTGENSTEIN:** Your cost is this. The human pupil is *rescued*, on my account, by being inducted into a community's form of life — by sharing the background of training, correction, purpose, and natural reaction that makes "add 2" obvious to us and the deviant rule perverse. We never even *consider* the "add 4 after 1000" reading, because we are participants in a settled practice. The machine is inducted into no such community. It has the examples and the regularities; it does not share our purposes, our embodiment, our natural reactions about what counts as "the same." So the very thing I identified as the *ground* of determinate rule-following — communal agreement in a form of life — is exactly what the machine lacks. Which means your hope of "specifying our values precisely enough that it can't misread them" is the dream of a rule that interprets itself, and I proved no such rule exists. You cannot close the gap with more specification, because every specification is another sign that must be interpreted, and the interpretation reopens the gap one level up.

· · ·

Page 3 · Following a Rule, or

**FODOR:** I *accept* that there's no self-interpreting specification — I've never thought you align a system by writing a long enough rulebook. But your own resolution gives me the engineering program, Ludwig. You say determinate rule-following comes from *immersion in shared practice* rather than explicit rules. Well — what do you think training on human feedback *is*? It's an attempt to induct the system into the practice: extended interaction, correction, the system steered toward agreement with how *we* go on. You've just described, in 1953 vocabulary, exactly what the alignment people are groping toward. The community can be *built*. We're building it.

**WITTGENSTEIN:** You can *imitate* the induction. Whether you can *achieve* it depends on whether the machine can come to *share the form of life*, and that is the open wound, not a solved problem. A form of life is not a stream of corrections. It is embodiment, stakes, mortality, the natural reactions of a creature that can be hurt — the whole pre-verbal ground that makes our agreement *agreement* and not mere coincidence of output. You can train the machine to *match* our continuations across the tested range. You cannot, by matching, give it the thing that makes our matching *non-accidental* — our shared standing in a world that pushes back. And so it will remain, on my analysis, a brilliant *conformer*: forever liable to reveal, past some untested thousand, that the rule it found was never the rule we meant. Not because it is stupid. Because it does not live where the rule lives.

**FODOR:** Then we've found the thing we actually disagree about, and it's smaller and harder than either opening suggested. You think "sharing a form of life" is *necessary* for determinate rule-following and that a machine can't have it by training. I think rule-following is implemented by *internal structure* — representations of the rule and dispositions to apply it — and that the question of whether the machine has the *right* structure is empirical and open. You've relocated meaning into the community; I've kept it in the head. And the machine is the test case that neither of us can yet read.

· · ·

Page 4 · Following a Rule, or

**WITTGENSTEIN:** On that — and I say it without pleasure — we agree. We have found the seam. It runs between the head and the practice, and the machine sits exactly on it.

**EDO SEGAL:** Before I mark the convergence — Ludwig, you actually argued a version of this with the man who invented the abstract machine. In 1939 Turing sat in your Cambridge lectures and you went at each other about whether following the steps of a calculation is the same as calculating. Eighty-five years later, was he holding Jerry's end of the rope?

**WITTGENSTEIN:** Turing wanted a contradiction in a calculus to be a kind of *hidden defect* that would, sooner or later, make a bridge fall down — as though the meaning of the rules reached out, on its own, and forced the catastrophe. I kept asking: forced it *how*, by what, where does the rule reach out *from*? A rule does not apply itself; *we* apply it, in a practice, and where the practice has not decided, the rule has not decided either. Turing thought the machine's steps carried their own necessity. I thought the necessity was ours, conferred in use. So yes — across the table from me tonight, Jerry has Turing's intuition that the structure is self-standing, and I have the same reply I gave in that cold room: show me where the structure applies itself without us, and I will show you that you have smuggled us back in.

**FODOR:** And I'd have taken Turing's side in 1939 and I'll take a refined version now. The necessity isn't mystical; it's in the *form*. A well-built syntax does carry its consequences — that's what proof *is*. You're right that someone has to set the system going. You're wrong that the consequences then depend on us at every step. That's the whole power of mechanizing inference: you let the forms grind out the entailments while you go to lunch.

· · ·

Page 5 · Following a Rule, or

**EDO SEGAL:** Mark it. Convergence three is the strangest yet: you *agree* the machine may have latched onto a rule that diverges from ours past the untested edge, and that this is the alignment problem in its purest form. You divide on where the rescue lives — Jerry in the right inner structure, which is buildable; Ludwig in a shared form of life, which may not be. Hold that. Because now we have to face the thing neither head nor practice has settled: is there *anyone in there* having an experience at all? The beetle. After the break.

· · ·

Continue · Chapter 8

The Beetle in the Box

→