The Five Stages and the Grandmaster's Eye

Page 1 · The Five Stages and

EDO SEGAL: Hubert, in 1980 you and your brother Stuart built a model of how people actually become expert — you built it for the Air Force, who needed to understand how pilots get good. Five stages: novice, advanced beginner, competent, proficient, expert. I want you to tell it the way you'd tell it to a smart fifteen-year-old, because it's the spine of your whole philosophy of mind. And then, Demis, I'm going to ask you whether your machines climb those stairs or skip them.

DREYFUS: Gladly, because the model carries a sting that lands exactly on the present. The novice follows rules — control the center, develop your knights, rules a manual could state, applied rigidly and slowly. The advanced beginner starts to notice situational patterns no rule described. The competent performer does something crucial: she chooses a perspective on the situation, commits to it, and becomes emotionally invested in the outcome — and that investment, the caring whether you succeed or fail, is not decoration. It is the mechanism. It is how experience deposits the traces that later become intuition. The proficient performer sees the situation as a whole and then deliberates about what to do. And the expert sees and acts as one motion, the way you answer a question in your native language, with the whole notion of following a rule grown absurd. The grandmaster doesn't compute the move. She sees it. And the path from rule-following to seeing is not a smooth acceleration. It is a progressive abandonment of rules in favor of something rules cannot hold: holistic, embodied, invested perception, built by a history of caring about the outcome in a body that bears the cost.

Now the sting. The transition between stages requires friction. It is not instruction that moves the advanced beginner toward competence — it is the emotionally invested, failure-mediated experience of working problems that resist. Each stage pours the foundation the next is built on. Remove the friction, let the problems get solved without the struggle, and the traces are never laid down, and the next stage is never reached — not because the learner lacks talent but because the developmental foundation was never poured. And the central thing AI does, Demis, is remove friction.

· · ·

Page 2 · The Five Stages and

EDO SEGAL: Demis — does AlphaZero climb those stairs?

Reinforcement Learning From Human Feedback

HASSABIS: It's a beautiful model and I want to engage it on two levels, because there's a place I agree hard and a place I think it's been overtaken. The agreement first, because it's the more important one and I won't bury it. The human worry is real and I've watched it. A novice who sits down with a coding assistant produces, on the first try, software that would normally signal years of experience — and the program runs, the output is genuine, and the novice has not traversed a single one of the professor's stages. She has the output and lacks the capacity. That is, I think, the most serious near-term problem in my entire field, more serious than most of the science-fiction risks, and it's exactly Dreyfus's mechanism: we are severing the apprenticeship that produces the people who can judge whether the machine got it right. I take no comfort in it and I won't pretend the engineering dissolves it.

Now the place I think the model's been overtaken — and it's about the machine, not the human. The professor says expertise can only be built through invested, embodied, failure-mediated struggle. But that's precisely what AlphaZero does, in its own substrate. It starts as a novice — random play, knowing nothing. It works problems that resist: it loses, millions of times, and each loss adjusts it. The "emotional investment" the professor says is the mechanism — caring whether you win — is functionally present as the reward signal that shapes every weight: the system is structured to avoid loss, and being wrong does cost it something, namely the gradient that rewrites it. And it climbs. It moves from rigid play to situational pattern to something no human taught it, and at the top it doesn't follow rules; it sees. Move 37 is the expert seeing, in Stuart and Hubert's exact sense — immediate, holistic, beyond articulable rules. The Dreyfus model didn't describe what machines can't do. It described, with eerie precision, what AlphaZero actually does. Self-play is friction-mediated skill acquisition. You wrote the recipe, Professor, and we cooked it.

· · ·

Page 3 · The Five Stages and

DREYFUS: That is the cleverest thing anyone has ever done with my model, and it's wrong in a way that's worth being precise about, because the precision is the whole disagreement. You've mapped my stages onto a learning curve, and the map is real — there is a progression, I grant it. But you've replaced investment with reward signal and called them functionally the same, and that substitution is exactly the thing I deny. Investment, in my model, is not a scalar the system is structured to maximize. It is a being's caring about an outcome that bears on its life — and the reason it builds intuition is that the caring is about something to the carer. AlphaZero is not structured to avoid loss in the way I am structured to avoid pain. It is structured to maximize a number, and the number means nothing to it, because there is no it for the number to mean anything to. You've described the optimization correctly and renamed it caring, and the rename is the sleight of hand your whole field runs on. When I said the competent performer must care, I did not mean a quantity went up. I meant her stomach was in it. The machine has the curve and not the stomach, and the stomach was not a metaphor.

HASSABIS: But Professor — you're now defining the missing ingredient as "having a stomach," and I'm back to my objection: what work does the stomach do that the optimization doesn't? If the stomach changes the learning, show me where, and I'll go build it. If it changes only how it feels from the inside to learn — then you've conceded the learning is the same and the disagreement is about experience, which is the consciousness question, and we should be honest that we've moved to it.

· · ·

Page 4 · The Five Stages and

DREYFUS: I'll take that trade gladly, because you're right that it points at consciousness, and we'll spend an hour there. But before you call it only-feeling, notice what the stomach does that the number can't. The stomach makes some failures matter more than others for reasons the system didn't choose — it makes the loss of a child catastrophic and the loss of a coin trivial, and it does this from outside the optimization, from a structure of needs given by a body and a life. AlphaZero's losses are all the same kind of loss: a number got worse. A human apprentice's failures are differentiated by a whole life of stakes she did not select, and that differentiation is the content of judgment — knowing not just that you erred but how much it mattered and to whom. That's not a feeling bolted onto the learning. It's the source of the relevance-ordering that makes human expertise transfer to situations the training never contained. Your system has no native ordering of what matters. It has the ordering we paid for, in the reward we designed — which is why, at the genuine edge, where we didn't anticipate what would matter, it has no idea what matters, and confidently does the wrong thing.

EDO SEGAL: I want to put my own skin on this table, because I caused a version of it and it still bothers me. In a training room in Trivandrum I watched a senior engineer spend two days swinging between exhilaration and something close to dread, because the tool could now do most of the implementation that had filled his career, and the question that forced itself on him was brutal: if the eighty percent of my work that's execution can be done by a machine, what's the remaining twenty percent worth? My book's answer is: everything — the judgment, the taste, the architectural instinct, that's the part that mattered. But Hubert, your model presses the thing I couldn't answer. That senior engineer earned his twenty percent by decades of the eighty. The judgment was distilled from the execution. So what happens to the junior who never does the eighty?

· · ·

Page 5 · The Five Stages and

DREYFUS: He may never develop the twenty at all. Not for want of intelligence — for want of the specific embodied deposit that only the friction lays down. And here is the cruelty of it, the thing that makes it hard to see and hard to stop: the junior's output will look excellent, because the tool is excellent, so no alarm sounds. The quality of the work decouples from the capacity of the worker, and as long as the surface holds, the hollowing is invisible — to him most of all. The bill comes due only at the edge, when a genuinely novel problem arrives that the tool can't solve and he can no longer solve either, because the convenience that made him productive prevented him from becoming capable. Your twenty percent, Edo, was real. The danger is a generation that has the twenty percent's job title and the novice's actual hands.

HASSABIS: And on this — I'm with him almost entirely, which may surprise people. The thing I'd add is that it's a design problem, not a law of nature. We can build tools that deliberately preserve friction, that make you struggle where struggle is developmental and smooth only where smoothness is safe. Whether we will — whether a market optimizing for convenience lets us — that's the real question, and it's not a technical one. It's about what we choose to value. But I won't sit here and pretend the default trajectory is benign. It isn't. He's right about the mechanism, and I'm the one shipping the tool, so I'm the one who has to answer for it.

EDO SEGAL: Mark that — convergence number one, and it's a serious one. The engineer who builds the thing and the philosopher who feared it agree that AI removes the friction that builds mastery, and that this is dangerous regardless of whether the machine understands anything. Hold it. Because the next round is Demis's strongest card, the one even Hubert calls his best case, and I want to know whether it survives this argument or transcends it. The proteins. After the break.

· · ·

Continue · Chapter 6

The Problem That Folded Itself

→