The Challenge of 1988

Page 1 · The Challenge of 1988

EDO SEGAL: Jerry, in 1988 you and Zenon Pylyshyn published a paper that defined thirty years of this field — a flat, in-principle impossibility claim against exactly the kind of network now running on every phone. You said connectionist systems could never explain the structure of thought; they could only fake it. The machines arrived and did the thing. I'd like you to state the challenge as sharply as you stated it then. And then I want to ask you the question that has to hurt: did they answer you, or did they prove you right in a way no one expected?

FODOR: I'll state it without softening, because softening it would be dishonest. Zenon and I granted the connectionists everything except the prize. We said: fine, maybe a network is how the brain is implemented — the wetware, the hardware. But it cannot be the theory of the cognitive architecture, the level where thinking is structured. Why? Because a network has no compositional constituents. Its representations are distributed patterns, not assemblies of reusable parts. So a network has no principled reason to be systematic. It might handle John loves Mary and Mary loves John both — if you trained it on both. But that's a brute fact about the training, not an explanation, because nothing in the architecture requires that mastering one bring the other for free. In a real symbol system, you can't represent the one with structure without thereby being able to represent the other — same parts, same syntax. The connectionist could mimic systematicity. He could not explain it. And a theory that can't explain the essential properties of thought is not a theory of thought.

· · ·

Page 2 · The Challenge of 1988

Now. Did the machines answer me? On the surface, spectacularly. A modern model handles novel sentences with ease, recombines concepts it was never explicitly trained to recombine, generates unbounded structured output. It looks like the most systematic artificial system ever built — and it's connectionist to the bone, and it got there by riding the scaling laws I'd have bet against. The triumphant reading says: Fodor's impossibility claim was just false, and the architecture he mocked did the thing he said it couldn't. That reading is widely held and it is not stupid. I lose, on the surface, and I want the audience to know I can say that out loud.

But the whole force of the 1988 argument was that the surface is not the issue.

EDO SEGAL: But.

FODOR: But the whole force of the 1988 argument was that the surface is not the issue. We never denied a network could be trained to behave systematically over some range. We denied it could deliver the principled, exceptionless, guaranteed systematicity that real compositional structure gives you. So the question the machine poses isn't "does it behave systematically?" — plainly it does. The question is whether its systematicity is the principled kind that refutes me or the approximate, training-bounded kind that confirms me. And you find out by going to the edges — novel recombinations far from training, recursion too deep, structure the statistics never saw. And there, reliably, the machines crack. They generalize across the bulk of the distribution and then fail, suddenly and bizarrely, on inputs a real symbol system would handle trivially. That brittleness is not a bug to be patched. It is the fingerprint of approximation — the place the soft, learned, statistical structure reveals it is not the hard, built-in, exceptionless thing. I didn't predict the fluency. I'll own that. I predicted the shape of the failure, and the shape of the failure is exactly what we observe.

· · ·

Page 3 · The Challenge of 1988

WITTGENSTEIN: This is the most interesting thing Jerry has said, and I am going to agree with the fact and dismantle the frame. The fact is real: the machine is brilliant in the middle of the distribution and breaks at the edges. I accept it. But look at what Jerry has done with it. He has set up a contest between two hidden essences — "principled structure" versus "mere approximation" — and declared that the brittleness reveals which essence is really inside. And I want to ask my rule-question: what are we doing with "principled" here? How would you ever see the difference, except in the behavior at the edges — which is to say, in more behavior? You have named the brittleness "the fingerprint of approximation." You could equally name it "the fingerprint of a finite system, like every system, including the human one." A human pupil also breaks at her edges. Push anyone far enough past their training and they confabulate, lose the thread, go on wrongly while sincerely believing they go on the same. The brittleness does not reveal a hidden essence. It reveals a limit, and limits are universal.

Or is "give a person the rule and she runs it to arbitrary depth" a fairy tale you have never tested?

FODOR: No. The difference is that the human's competence is recursive and open in a way the machine's is not — give a person the rule and she runs it to arbitrary depth; the machine degrades with depth precisely because it never had the rule, only the statistics of the rule's outputs. That's not "everyone has limits." It's a structural difference in the kind of competence.

WITTGENSTEIN: Is it? Or is "give a person the rule and she runs it to arbitrary depth" a fairy tale you have never tested? Give a person the cat that the dog that the man that the boy that the girl saw knew owned chased ran and she is lost at the third nesting, exactly like your machine, because human working memory is finite, like the machine's context. The "arbitrary depth" of human competence is a property of the idealized rule, not of any actual human being. You have compared the real machine to an idealized human and called the gap an essence. That is not science. That is theology with a confusion matrix.

· · ·

Page 4 · The Challenge of 1988

FODOR: [pause] The recursion point is fair and I've heard it before and it has teeth. But there's a residue you can't dissolve. When a human does fail at depth five, she can be told the rule and take it up — she can step outside the performance, grasp the principle, and apply it where her memory failed. The machine can't be told the rule in a way that fixes the structure; it can only be shown more examples. That difference — between grasping a rule and absorbing a distribution — is the one I'll die on, and it's not idealized. It's the difference between a creature that can follow a rule and a system that conforms to one.

WITTGENSTEIN: [long pause] And now you have walked, of your own accord, into the center of my philosophy, and I did not even have to lead you. "Following a rule" versus "conforming to one." Jerry, that distinction is mine — it is the most vertiginous thing in the Investigations, and it does not say what you think it says. You think following a rule is grasping an inner principle. I spent years proving it is no such thing. The next round is going to cost you, because the very weapon you just reached for was forged in my workshop, and it does not fire the way you assume.

EDO SEGAL: That's the cleanest handoff I've ever heard at a debate table — one of you reaching for a weapon and the other saying "I made that, and you're holding it backwards." Round on rule-following next. It is the hinge of the whole evening, and as it happens, the hinge of the alignment problem. After this.

· · ·

Continue · Chapter 7

Following a Rule, or Only Conforming

→