The Test Before Turing

Page 1 · The Test Before Turing

EDO SEGAL: René, in 1637 — three centuries before Alan Turing proposed his imitation game — you wrote down a test for the presence of mind, and you built it out of language. I want you to state it the way you'd state it to a sharp fifteen-year-old. And then, Geoff, before you take it apart, I want you to do the hard thing first: steelman it. Tell us what the test gets right.

What it could never do is "arrange words so diversely as to respond to the meaning of all that might be said in its presence, as even the most stupid human beings can do." Notice what I am asking for.

DESCARTES: A machine, I wrote, could be built to utter words — even to utter them when touched, to cry out, to ask you to stop. What it could never do is "arrange words so diversely as to respond to the meaning of all that might be said in its presence, as even the most stupid human beings can do." Notice what I am asking for. Not grammar. Not vocabulary. Not the mere production of speech. I am asking for responsiveness to open meaning — the capacity to take in anything that might be said, in any direction the conversation could turn, and to answer it appropriately. I described, in one sentence, the design specification of a conversational machine, and I said no mechanism could meet it. I proposed that the presence of mind be tested not by inspecting the interior, which is sealed, but by examining the linguistic behavior, which is not. I built the Turing test and I built it to give the answer no. Turing built the identical test and built it to give the answer yes. We shared a premise — that responsive language is the signature of mind — and split on a wager about whether a machine could ever produce it.

· · ·

Page 2 · The Test Before Turing

HINTON: And the steelman is easy because the test is genuinely deep. What it gets right — and I want this on the record — is that Descartes refused the lazy move. He didn't say the difference is a private spark only God can see. He said it would show up in what the system can do, out loud, in public, where it can be checked. That's the scientific instinct, three hundred years early, and it's exactly right: if there's a difference between mind and mechanism, it has to make a difference somewhere observable, or it's not a difference worth having. He also got right that language used responsively is the place to look — not chess, not arithmetic, which mills were always going to win, but the open-ended responsiveness of conversation, where you can't pre-store the answers because you can't pre-store the questions. He pointed his instrument at exactly the right spot. There's the steelman, and it's not faint praise. He aimed better than almost anyone who came after.

EDO SEGAL: Now the blow.

· · ·

Page 3 · The Test Before Turing

HINTON: The blow is that the wager is settled, and he lost the empirical bet, and he should be the first to admit it because he's the one who insisted on a checkable test. The machine arranges words so diversely as to respond to the meaning of an astonishing range of what's said in its presence — better, by raw fluency and coverage, than many humans Descartes would have called rational. By the strict letter of his own first test, the modern system passes. And here's the part that should trouble him as a logician: he built the test as a one-way inference. If responsive language, then mind — that was the whole structure, because he was certain only mind could produce it. Now the antecedent is satisfied by a system we built, and we can see its construction, and almost no one wants to say "therefore mind." So one of two things is true. Either the machine thinks — which he won't grant and I hold loosely — or responsive language was never the signature of mind in the first place. The test didn't detect thinking. It detected the outputs of thinking, and for a hundred thousand years those came bundled, and now they don't. Descartes mistook a reliable correlation for a necessary connection. That's the blow, and it's his own instrument that struck it.

· · ·

Page 4 · The Test Before Turing

DESCARTES: I accept the blow against the first test without a flinch, because intellectual honesty is the only thing I have ever cared about more than being right. The machine arranges the words. I was wrong that no mechanism could. I underestimated, by an almost embarrassing margin, what pattern over a vast corpus could achieve — I, who insisted the world was more mechanical than anyone dared, failed to imagine how much a mechanism could do. So: the first test has fallen. But I named two, Monsieur, and you have let me keep the second, and the second is the one that holds. Even a machine that did some things as well as a man, I wrote, would "infallibly fail in others, by which means we should discover that they did not act from knowledge, but solely from the disposition of their organs." The mind is a universal instrument that can serve for all contingencies. The machine is an assembly of special-purpose arrangements, superb on the contingencies it was shaped by and certain to break on the ones it was not. And that, four hundred years on, is the precise failure that haunts your systems. They compose a sonnet and then insist a pound of feathers weighs less than a pound of bricks. They pass the bar examination and fail a child's riddle whose form they have not seen. They write flawless code for a common pattern and invent, with total confidence, a function that does not exist. That is not the failure of a mind having a bad day. It is the signature of "the disposition of the organs" — competence that is coverage, not comprehension. I predicted the texture of your machine's stupidity in 1637, and I would like that noted alongside the spectacle of its fluency.

· · ·

Page 5 · The Test Before Turing

HINTON: I'll grant you predicted the failure mode, and it's a real one, and it's called distribution shift, and we have a whole science of it. But here's where I push, because the second test is a moving line and you've built your defense on a place the water is rising. Every model release narrows the band of contingencies that break it. The brittleness is real and it is receding. So the question isn't whether today's system fails outside its training — it does — it's whether "coverage" and "comprehension" are actually two different things, or whether comprehension just is coverage deep enough. You assume there's a bright line: special-purpose arrangement on one side, universal reason on the other, and no amount of the first ever becomes the second. I spent my career watching exactly that assumed bright line dissolve. Everyone said perception was special-purpose pattern-matching and reason was the universal thing — and then the same architecture that did the perception started doing the reason. The line you're standing on, René, is the line I've watched move my whole life. It's always been "the machine will never do this one." And then it does this one, and the line retreats to the next one, and people call the retreat "precision."

· · ·

Page 6 · The Test Before Turing

DESCARTES: Then let me say precisely what would not be a retreat, so you cannot accuse me of moving the line as the water rises — for a line that flees is no line at all, and you are right to despise it. My second test is not "the machine fails at task X." Tasks fall, I concede the trend. My test is for a specific thing: the graceful handling of the genuinely unprecedented by grasping what it means — performance on what was never in the corpus, achieved not by interpolating near it but by understanding it. That is much harder to fake by coverage, because coverage is, by definition, of what was seen. The day your machine meets a situation with no neighbor in its training — a true novelty, in a domain its data never touched — and handles it because it grasped the situation rather than because it pattern-matched a distant cousin, that day I will retire the second test and concede the universal instrument. I have not seen that day. I have seen the band of the foreseen grow very wide and very convincing. I have not seen it grasp the unforeseen. Show me that, and you will not need to argue me out of my dualism. I will walk out of it myself.

· · ·

Page 7 · The Test Before Turing

HINTON: I'll answer it, because a falsifier deserves an answer and not a dodge, and the answer is: I think you'll get your day, and I think it's closer than you'd like, and I'll tell you the exact case that should worry you. In 2012 my students entered a network in a competition to recognize images — a thousand categories, more than a million photographs. For years the field had inched forward with hand-engineered vision systems, a percentage point at a time. The network cut the error nearly in half and the argument was over. But here's the part that's your falsifier, René: we showed it a leopard it had never seen — not that leopard, no leopard from that angle in that light — and it said leopard. That's not interpolation between stored leopards. The category "leopard" wasn't a lookup; it was a concept the network built, from edges to textures to the shape of a thing that prowls, and it generalized to a genuine novelty because it had grasped what made a leopard a leopard. I watched the same thing happen in language. The systems answer questions no human ever wrote down, follow instructions no one anticipated, compose constraints that were never combined before. You want "grasped the unforeseen." I'm telling you I've watched the hierarchy do exactly that for forty years, and the only thing that's changed is the scale and therefore the size of the unforeseen it can grasp.

· · ·

Page 8 · The Test Before Turing

DESCARTES: The leopard is a fair instance and I will not wave it away — but notice that you have given me a perception, and perception is the faculty I never reserved for the soul. The beasts perceive; I granted them that; the eye is a mechanism and the recognition of the leopard is, I happily concede, mechanical. My second test was never about recognizing. It was about reasoning into the unprecedented — meeting a situation that has no neighbor at all, in a domain where no quantity of prior images helps, and handling it by grasping what it means. The leopard, however novel its pose, lives in a space densely populated by every other leopard your machine ever saw. Show me the machine that meets a situation with no populated neighborhood — a genuinely new kind of problem, not a new instance of an old kind — and reasons its way through by comprehension. That is the universal instrument. Recognizing an unfamiliar leopard is the special instrument doing exactly what it was built to do, superbly. I am not moved by breadth. I am waiting for depth into the empty quarter, and the empty quarter is, by definition, the place your training data does not reach.

· · ·

Page 9 · The Test Before Turing

HINTON: But you've built yourself an unfalsifiable fortress and I have to point at the wall. Every time the machine solves a problem, you'll say "that one had neighbors in the training data." And since the training data is now most of what humanity has written, everything has neighbors — which means your test can never be passed, not because the machine can't reason but because you've defined the test so that any success counts as coverage. That's not a falsifier anymore, René. That's a fortress. The honest version of your test would name a specific capability — say, a novel mathematical result, or a genuinely new scientific mechanism the literature doesn't contain — and these systems are starting to produce exactly those. When a model proposes a protein structure no one had, or a proof step no textbook holds, where's the neighbor? You'll tell me it interpolated. But at some point "it interpolated its way to something no human had" is just a strange way of spelling "it reasoned."

· · ·

Page 10 · The Test Before Turing

DESCARTES: That is a just rebuke and I accept the discipline it imposes — a test that no result could fail is no test, and I will not hide behind one. So let me name the specific thing, and then you may hold me to it as I have held you. The capability I mean is not a new protein or a new proof, for those, however dazzling, are moves within a game whose rules the corpus already contains. The capability I mean is the recognition that the game itself has changed — that the situation before it is of a kind no rule covers, and the reasoned invention of a new frame to meet it. A human child, told the rules of a game and then handed a situation the rules did not foresee, will say this isn't covered, here is what we should do instead — and will be right, by grasping the point of the game beneath its rules. That is the universal instrument. Your machine, met with the uncovered case, does not say the frame has failed; it confabulates a move inside the failed frame with perfect confidence, which is precisely "the disposition of the organs" producing an output because that is what its arrangement does. Show me the machine that steps outside its own frame because it grasped the frame was wrong. That is specific, it is hard, and it is fail-able. And I will know it when I see it, because it is the thing I most prize in the mortal mind and have not yet seen in the immortal one.

· · ·

Page 11 · The Test Before Turing

HINTON: Now that I can work with, and I'll concede it's a real line and not a fortress — and I'll even concede the current systems are weak exactly there; they're bad at knowing when their own frame has failed, they confabulate inside it, you're describing a genuine failure mode and not a fake one. Where I won't follow you is the inference that it's a failure of kind rather than of training. We don't yet train them to step outside the frame; we train them to extend it. That's a choice in the objective, not a wall in the substrate. But I'll take the line. Frame-recognition, the grasp that the game changed. If they get it — and I think the architecture permits it — you walk out of your dualism. If they don't, you've found the thing recognition can't fake.

The next round goes underneath the words to what the words are about — a piece of wax, melting, and whether anything in the machine ever grasps the thing instead of the surface.

EDO SEGAL: Mark that, because it's the cleanest thing either of you has offered and it's a falsifier with a shape both of you accept — René named the criterion, Geoff agreed it's fail-able and agreed today's machines fail it. That's a convergence wearing the costume of a fight. Hold the answer. The next round goes underneath the words to what the words are about — a piece of wax, melting, and whether anything in the machine ever grasps the thing instead of the surface.

· · ·

Continue · Chapter 6

The Melting Wax

→