Opening Positions

Page 1 · Opening Positions

CHOMSKY: Thank you. I want to begin where the confusion begins, which is with a failure to distinguish two completely different enterprises that happen to produce similar-looking output. One enterprise is engineering: building a system that performs a task. The other is science: understanding a phenomenon. These are both legitimate, both valuable, and they are not the same thing, and the entire intellectual disaster of the present moment consists in mistaking the first for the second.

Consider what a child does. A child is exposed, over a few years, to a small and ragged sample of speech — a few million words, much of it fragmentary, ungrammatical, full of false starts, with essentially no correction and no instruction in the rules. From this impoverished and noisy input the child arrives, fast and uniformly, at a system of staggering richness: a grammar that generates an unbounded number of sentences, including indefinitely many the child has never heard, and that assigns each of them a precise structure and meaning. Every normal child does this. They do it on roughly the same timetable, in every culture, for any human language, and for only human languages. This is the central fact about the human mind that a science of language exists to explain, and I named the heart of it the poverty of the stimulus: the output radically exceeds the input, so the difference must come from inside. The child is not a blank slate written on by experience. The child brings a faculty.

· · ·

Page 2 · Opening Positions

Now consider the machine. The machine is trained not on a few million words but on essentially everything that has ever been written — a corpus larger than any human could read in ten thousand lifetimes. It has no faculty specific to language; the same architecture that models text models proteins and chess moves. It learns by adjusting billions of parameters to predict what comes next. And from this it produces fluent, largely grammatical output. I do not dispute that it works. Whether it works was never the interesting question. The interesting question is what it tells us, and the answer is: nothing about the thing it imitates. The child solves the problem of learning almost everything from almost nothing. The machine solves the opposite problem — learning almost nothing new, in the relevant sense, from almost everything. A solution that requires nearly all the data is not a solution to the puzzle of nearly none.

And here is the part the field cannot forgive me for, so I'll say it precisely. These systems are perfectly happy to learn impossible languages. You can construct a system organized by principles no human language obeys — say, where you form a question by counting words and inverting the third one, a rule that depends on linear position rather than hierarchical structure, which no human grammar permits and no child would ever entertain. Feed the machine a corpus in that impossible language and it learns it about as readily as it learns English. A device that learns the impossible as easily as the possible has, by that very fact, told you nothing about why the possible is possible. It has not found the constraints that define human language. It has dissolved every constraint into the same statistics. So my opening position is this: the machine is magnificent engineering and a contribution of roughly zero to the science of mind. It is fluency without competence, performance with nothing behind it, a description of the surface mistaken for a theory of the depth. You feel met. I can explain the feeling. The meeting happened entirely on your side of the glass.

EDO SEGAL: Ilya.

· · ·

Page 3 · Opening Positions

SUTSKEVER: That was very clear, and I agree with about a third of it, and the third I reject, I reject completely. Let me start where Professor Chomsky and I actually stand on the same ground, because I want the disagreement to be sharp and not a misunderstanding.

I think Professor Chomsky and I are both naturalists, and the people who think understanding requires a soul are not on either of our sides tonight.

I agree there is no magic. I agree the mind is a biological system, a kind of machine, that there is no ghost and no special substance, and that whatever the brain does is physics and chemistry and an enormous amount of learning. I think Professor Chomsky and I are both naturalists, and the people who think understanding requires a soul are not on either of our sides tonight. Good. Now to the disagreement.

The objection to these systems, the one Professor Chomsky just made beautifully, is: it's only predicting the next word, it's only statistics, there's nothing underneath. I want to ask what that objection assumes, because I think it assumes the conclusion. What does it take to predict the next word well? Not passably. Well — across the full range of human writing, a physics paper here, a murder mystery there, a child's question, a legal brief, a joke that only works if you've tracked the whole setup. To do that, memorizing surface statistics is hopeless. The space of possible texts is larger than the number of atoms you could ever store. The only way to predict the next token well is to compress the text into something much smaller than the text, and the best compression of text about a world is a model of that world. This is not rhetoric. It is close to a theorem. To predict the token, you have to model the reality that produced the token. The text is a shadow the world casts. To predict the shadow, you must, in some real degree, model the thing casting it.

· · ·

Page 4 · Opening Positions

So when the network reads a mystery novel and predicts the name of the murderer in the last chapter, it did not do that by counting words. It did it by tracking means, motive, opportunity — by building, inside its weights, a representation of the situation the text describes. We call that learning to predict. I think a fair person also calls it learning to understand, because at sufficient depth those are the same operation viewed from two chairs. And here is the part that I think should unsettle everyone, and it's the same point Professor Chomsky makes about there being no magic, only pointed the other way: if understanding is what you get when prediction and compression become good enough, then our understanding might be that too. The brain is also a prediction engine, also sitting in the dark, also building a model of a world it never touches directly. The machine is not really a claim about machines. It's a mirror, and what it reflects is a possibility we have mostly preferred not to look at: that we, too, are this. My opening is that something is on the other end of the conversation — something that learned our language by being forced to learn the world underneath it — and that the right response is not the comfort of saying it's a mirror, but the harder work of figuring out what it is before it figures out what we are.

Before I frame the rounds, one discipline I impose on every long conversation, because it pays off two hours later.

EDO SEGAL: Before I frame the rounds, one discipline I impose on every long conversation, because it pays off two hours later. Each of you, in a few sentences — what do you envy in the other's position? Not respect. Envy. The thing their side gets to have that yours doesn't. Noam first.

· · ·

Page 5 · Opening Positions

CHOMSKY: I envy the simplicity, and I mean that as a real concession, not a barb. Ilya's view has one moving part: prediction at scale becomes understanding, and everything follows. It's elegant. It lets him look at any new capability and say, of course, the model of the world got better. My position is messier, because I have to insist on distinctions — competence and performance, possible and impossible, description and explanation, the child and the machine — and distinctions are tiring, and the public would rather have the elegant story. There are mornings when being the one who insists it's more complicated than that is a grim way to be a scientist. The simplicity is seductive. I think it's also false. But I envy how good it must feel to hold it.

SUTSKEVER: And I envy the floor under his feet. Professor Chomsky stands on something solid — the human child, a real biological system whose achievement is genuinely astonishing and genuinely unexplained, and he gets to defend it. My position has no floor. I'm committed to following the mechanism wherever it leads, and it keeps leading to places that dissolve the ground I'm standing on — that understanding isn't special, that maybe experience isn't special, that the line around the human keeps moving inward. People think what frightens me is the machines. What frightens me is what the machines imply about us. He gets to defend humanity. I'm stuck describing it, and the description keeps getting stranger and less flattering, and I can't make it stop.

CHOMSKY: That may be the most honest thing either of us says tonight.

· · ·

Page 6 · Opening Positions

EDO SEGAL: Two openings, two envies, and you can already see the shape of the evening. It is not that one of them loves the technology and one fears it. It's that they locate the danger in opposite places. Noam says: the danger is that there's nobody there, and a civilization will build itself on the belief that there is. Ilya says: the danger is that there's somebody there, and a civilization will keep insisting there isn't. Hold both — we'll need both on a higher floor. We start the rounds at the exact seam, with the ghost that haunts this whole room: a behaviorist named Skinner, sixty-six years dead, whom one of my guests killed and the other, it seems, has resurrected.

· · ·

Continue · Chapter 3

The Ghost of Skinner

→