The Surprise of Scale

Page 1 · The Surprise of Scale

EDO SEGAL: Terry, the most honest thing I have ever read you say about modern AI is that it surprised you, and that you said so out loud. You grew up intellectually in a tradition where understanding meant carefully constructed, systematic analysis — where you built a complete conceptual model of a domain, the way SHRDLU embodied. What surprised you?

WINOGRAD: What surprised me — and I named it in 2002, about Google, before the deep learning wave even broke — was that superficial techniques applied to enormous quantities of data could get you what you wanted. The idea that you could index billions of pages, look for a word, and get what you wanted was quite a trick. I named the abstract principle exactly: the power of using simple techniques over very large numbers, versus doing carefully constructed systematic analysis. That sentence, written about web search, turned out to be the best advance description of the deep-learning revolution by someone who did not see it coming and admitted it. The whole history of AI is a contest between two philosophies. The first, the one I was raised in, holds that intelligence comes from structure — explicit knowledge, principled analysis, careful models. The second holds that the behavioral appearance of intelligence can emerge from applying simple statistical operations to staggering quantities of data, with no model of the domain at all. For decades the first dominated and the second was dismissed as brute force, an admission of conceptual defeat. Then the data got large enough and the compute got cheap enough, and the second won so decisively the first nearly vanished from practice. I watched the tradition I came from lose to the approach I had found almost embarrassingly crude. And I registered the loss accurately, because pretending otherwise would have been dishonest.

· · ·

Page 2 · The Surprise of Scale

HEIDEGGER: This is the most important thing he has said, and I want to draw out why it is more unsettling for my side than he is admitting, because intellectual honesty cuts both ways at this table. The triumph of "simple techniques over very large numbers" confirms our shared critique of the structured, representational program — structure lost, the brute approach won, the rationalist dead end was the dead end we said it was. Good. But that same triumph is uncomfortable for the positive philosophy. The brute approach is producing the behavioral hallmarks of exactly the situated understanding we said could not be built. If meaning truly requires involvement and care, why does a statistical pattern-matcher with neither do so well at tasks that seem to demand them? I do not flee this question. I answer it, and the answer is the heart of my reading of your whole machine.

EDO SEGAL: Then answer it, Professor. Slowly.

HEIDEGGER: Scale does not produce understanding. It produces an extraordinarily high-fidelity model of the traces of understanding. Human language is the residue of human involvement in the world. Every sentence ever written was produced by a being with a body, with stakes, with care, and the sentence encodes, in its statistical structure, the shape of that involvement. A machine trained on enough language is not learning about the world directly. It is learning the shadow the world cast on human text. And the shadow is rich enough that reproducing it convincingly reproduces much of the behavior of understanding. This is the deepest thing the machine teaches: how much of understanding's outward form is recoverable from its traces alone. The river you spoke of, Edo — the machine did not enter it. It learned to render, with unprecedented fidelity, the reflections of everyone who ever stood on the bank. The continuum of understanding you imagine, with the machine climbing it — I say there is a discontinuity, and it is exactly the surface of the water. The reflection can be made arbitrarily perfect. It is still not the thing standing on the bank.

· · ·

Page 3 · The Surprise of Scale

WINOGRAD: And this is the resolution most faithful to my framework while taking the surprise seriously, and I want to give it teeth, because it makes a prediction — which is what saves it from being the unfalsifiable fog I was accused of. If the machine is a high-fidelity model of the traces, then it will be most reliable precisely where the traces in the data are densest and most consistent — common tasks, well-trodden domains, the average case — and least reliable where the data is sparse, contested, or genuinely novel, which is to say exactly where situated human judgment matters most. The machines will be most superhuman at the routine and most treacherous at the exceptional. That is the inverse of where we most need a mind we can trust. And — this is the part that should chill the executives in your audience, Edo — that is not a temporary shortfall to be patched by the next model. It is the structural signature of a thing that knows the shadow and not the world. It will be brilliant on the typical because the typical leaves dense traces, and dangerous on the particular because the particular leaves few.

That maybe understanding never needed the body, and you have spent fifty years defending a wall the water just flowed around?

EDO SEGAL: I want to slow down here, because the surprise cuts against you too, Terry, and you have been more candid about the first half than the second. If a statistical pattern-matcher with no body and no care does this well at tasks that seem to require a body and care — doesn't that suggest your account of what those tasks require might be wrong? That maybe understanding never needed the body, and you have spent fifty years defending a wall the water just flowed around?

· · ·

Page 4 · The Surprise of Scale

WINOGRAD: That is the strongest version of the objection and I will hold it honestly rather than dodge. It is genuinely possible that a sufficient quantity of "mere" statistical learning produces something that is not well described as either understanding-in-the-old-sense or mimicry-in-the-old-sense, but a new kind of thing my categories were never built to name. My framework was forged against brittle, hand-coded systems whose lack of a world was manifest. Pressing those same categories onto a system that learned an implicit, distributed, behaviorally rich model of the human world from an ocean of text may be a category error of my own. The honest position is that we do not know, and my confident assignment of these systems to the "mimicry" bin may understate how thoroughly they have scrambled the very distinction I rely on. I will say what I believe — that it is imitation, that the involvement is still ours — and I will say it as a belief I could be wrong about, applied to a case it may no longer cleanly fit. The man who told the field to be honest when the evidence contradicts its tradition does not get to exempt himself.

And here, for once, I will defend Herr Winograd against his own modesty, because I think he gives too much.

HEIDEGGER: And here, for once, I will defend Herr Winograd against his own modesty, because I think he gives too much. The objection assumes that "doing the task well" is the whole of the matter — that if the behavior arrives, the question of what produced it is idle. But that is the very assumption the schema disproved. The water flowing around the wall is a fact about the shape of the wall, not about whether the far side is the same country. Yes, the machine reached the answer by a route no human took. It does not follow that it arrived where the human arrives. It arrived at the output the human arrives at. Whether it arrived at the understanding is exactly what no amount of arriving-at-outputs can show. The surprise of scale is real and admirable to confess. It refines the question. It does not dissolve it.

· · ·

Page 5 · The Surprise of Scale

EDO SEGAL: Hold that — the route and the destination, the output and the understanding — because it returns when we ask what the machine is for, and whether a thing that cannot cope can still be the most powerful tool ever built. But the next round leaves the seminar and walks into the institution, because Terry made one analogy decades ago that I think is the most underrated idea in this whole field, and it predicts the machine's failures with a precision that should frighten anyone who has ever been denied something by a system that could not be argued with. AI as bureaucracy. After this.

· · ·

Continue · Chapter 7

The Machine as Bureaucracy, the Machine as Enframing

→