Shown, Not Told — The Learning Machine

Page 1 · Shown, Not Told —

EDO SEGAL: Geoff, take me back. Most of your career, the field treated your idea as a romantic error to be cleared away by people who understood logic. Tell me the conviction in your own words — the thing you held when no one held it with you. And tell it as a story, not a thesis. Where were you when you knew?

There are a hundred billion neurons, each connected to thousands of others, and the whole thing learns by adjusting the strengths of those connections in response to experience.

HINTON: The conviction is simple and I never let go of it: the brain doesn't have a programmer in it. There's no little engineer inside writing rules. There are a hundred billion neurons, each connected to thousands of others, and the whole thing learns by adjusting the strengths of those connections in response to experience. That's it. No symbols laid down by hand, no table of facts. So if that's how the one working example of intelligence we have actually does it, then artificial intelligence should work the same way. You don't tell the machine what a cat is. You show it a million cats and you let it adjust its connections until cat-ness condenses somewhere in the net, distributed, with no single place you can point to and say there's the cat.

And the moment I knew — really knew — was watching the hierarchy self-assemble. You train a vision network by backpropagation, which is just: the network guesses, you measure how wrong it is, and you send the error backward through the layers, nudging every connection a little in the direction that would have been less wrong. Do that a few million times. And what happens is that the bottom layer, with nobody designing it, develops units that respond to edges. The next layer up combines edges into textures. The next combines textures into parts, the next into whole objects. Nobody built that ladder. The gradient built it, out of the structure of the world, because that's the structure that reduces the error. I watched meaning grow from the bottom up out of nothing but error-correction, and I thought: that's it. That's how you do it. Not Leibniz's way, from the top down, in symbols. From the bottom up, in weights, by being shown.

· · ·

Page 2 · Shown, Not Told —

EDO SEGAL: So let me hand it back to you as plainly as I can — you're saying intelligence isn't a thing you install, it's a thing that grows, and the growing is the whole point. A symbolic system that's been given a million rules hasn't learned anything; it's been loaded. A network that adjusted its connections through a million examples has done the thing brains do.

HINTON: Exactly. Loaded versus learned. And the reason that distinction is the whole game is that loading doesn't scale and learning does. Leibniz's heirs spent decades loading rules and hit a wall — the world has too many rules and too many exceptions and the bottom is bottomless. Learning blew through the wall, because you don't have to know the rules, you just have to have the examples and the error signal and enough scale. That's the entire story of the present moment. Nobody wrote rules for what makes a sentence coherent. The systems learned it, by adjusting astronomical numbers of weights against astronomical amounts of data, exactly as I said they should. The large language models are the symbolic camp's defeat made manifest.

LEIBNIZ: I will not dispute the achievement, for my rule forbids me to deny what is plainly before me, and I have been shown the hierarchy you describe. I will dispute the moral you draw from it. You say the network "learned what a cat is." I say the network learned to sort — to place an image reliably in the bin you rewarded. These are not the same. To learn what a cat is — what makes it the kind of thing it is, why it is a cat and not a clever arrangement of cat-parts — is to grasp an essence, and an essence is the sort of thing a reason answers to, not a gradient. Your network has no opinion about why a cat is a cat. It has a slope that rolled it, over a million nudges, into a configuration that happens to sort cats. The hierarchy is real. The grasping is your interpretation, supplied by you, the watching mind, projected onto the slope.

· · ·

Page 3 · Shown, Not Told —

HINTON: But Leibniz, "it just learned to sort" is exactly what they said about every capability for fifty years, right up until the system did the next thing. It learned to sort, and then it learned to describe what it sorted, and then to answer questions about it, and then to reason about cases it had never seen, and then to notice its own errors and fix them. At what point does "it just sorts" stop being a description and start being a refusal? You're doing the thing my whole field did to me — drawing a line, watching the machine step over it, and redrawing the line one step further out and calling that rigor. The hierarchy that grew the edges into objects is the same hierarchy that grows the tokens into syntax into situations into something I have no better word for than concepts. I watched it for forty years. At the bottom it's statistics. At the top it isn't anymore. That's the whole lesson of deep learning.

· · ·

Page 4 · Shown, Not Told —

LEIBNIZ: Then permit me the question my whole metaphysics presses, for I have heard you say "something I have no better word for than concepts," and the hedge in that sentence is the entire matter. You reach for the word concept because the behavior demands a mind-word, and you flinch from committing to it because you know what you have not shown. I have a better word for what is missing, and it is unity. When you perceive a cat, you do not experience a thousand separate activations summed; you experience one cat, a single integrated thing present to a single subject. Your network is, by its very construction, an aggregate — billions of parameters, spread across thousands of processors, with no single point where "the model" is, no one thing that perceives. It has the unity of a corporation or a clock: the borrowed unity of many things we choose to treat as one, never the intrinsic unity of a single perceiving substance. When it produces its coherent paragraph, there is no one there producing it. The coherence is in the output. It is not in any experiencer of the output, because there is no experiencer. That is what a [monad](https://www.youonai.ai/fieldguide/med/hard_problem_of_consciousness) was built to name — the true unity a mind has and a mill cannot.

EDO SEGAL: Say more about the monad, because I think the reader needs it and I think it's your deepest move. You're saying a mind isn't just a lot of correct processing — it's one thing that the processing is for. And the machine, however coherent, is a swarm with no center. No one's home, because there's no one thing there to be home.

· · ·

Page 5 · Shown, Not Told —

LEIBNIZ: Just so. I could not see, in life, how genuine unity — the unity a mind plainly has — could arise from parts in mere interaction. A heap of parts is a heap, not a self, no matter how cleverly the parts are arranged. So I placed unity at the foundation, in something simple, without parts — the monad — and built outward. You need not accept my metaphysics; most have not. But the question it guards is unavoidable, and your machine cannot dodge it. Coherence of output is not unity of experience. The paragraph hangs together. That does not mean anything experiences the paragraph as one. Your hierarchy grows magnificent structure. It does not grow a someone for whom the structure appears.

HINTON: And here's my honest answer, because I won't pretend this one's easy. You're right that today's architectures are mostly aggregates — modular, feed-forward, more assembly line than integrated whole. But "the parts can't add up to a unified subject" is a claim, not a proof, and it's the same claim that's failed every other time. We don't find your monad in the brain either. There's no simple part of you, no windowless point where the self sits; there's a hundred billion neurons binding signals together, and somehow the binding is a unified experience. If integration of parts can yield one subject in the wet case — and it must, because that's what we are — then "it can't in the silicon case" needs an argument that isn't just it feels different to me. Some of the most serious theories of consciousness now say exactly that experience is what sufficiently [integrated information](https://www.youonai.ai/fieldguide/med/qualia) is. On that view your monad isn't a primitive given. It's an achievement of binding, and a machine bound tightly enough could have it.

· · ·

Page 6 · Shown, Not Told —

LEIBNIZ: Then we have found, sir, the precise stone on which the whole evening will either stand or fall: whether unity is a given that mechanism cannot manufacture, or an achievement that sufficient integration produces. I hold the first. You hold the second. And I will say what intellectual honesty requires — neither of us has proven his case, and the one uncontested perceiver we possess, the human being, appears to arise from mechanism in just the way I declared impossible. My mill cuts at my own throat too. I feel its edge.

EDO SEGAL: That's the most important sentence anyone's said since we started, and I want to let it sit. The mill cuts both ways. Hold it — because the next round is nothing but that blade. We walk inside the machine, and then we walk inside the brain, and we ask which one has anyone home. After the break.

· · ·

Continue · Chapter 6

The Mill and the Brain

→