Reward All the Way Down

Page 1 · Reward All the Way

EDO SEGAL: Rich, in 2004 you stated the conjecture that I think is the most consequential and least comfortable thing in your whole body of work. The reward hypothesis: that all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of the cumulative sum of a received scalar signal — reward. Let me make it human and let me make it brutal. My love for my children, my wanting to make something beautiful, my staying up at three in the morning over an idea — you are saying all of it, in full felt richness, is the maximization of a number. Defend that to a parent. And then Gottfried, I'm coming to you, because you built a cosmos on optimization and I want to know if you recognize this God.

SUTTON: I'll defend it, and I'll defend it without softening it, because softening it is cheating. The reward hypothesis does not say you sit there computing rewards, any more than a planet computes its orbit. It says the structure of goal-directed behavior — whatever it feels like from inside — is the structure of cumulative reward maximization. Your love for your children is real. I'm not deflating the feeling. I'm saying the feeling is what a particular reward structure is like from the inside — that evolution shaped a signal whose maximization is your children's flourishing, and the warmth you feel is that signal being maximized, lived. And here's my evidence, the dopamine again: the brain really does seem to learn by maximizing a reward-prediction signal. This isn't a metaphor I find pretty. It's the best-supported account we have of how any animal, including you, comes to want anything. The richness you're protecting — I think it's the richness of a very long, very deep reward structure laid down over a billion years. It's not less for being that. A symphony isn't less for being air pressure.

· · ·

Page 2 · Reward All the Way

EDO SEGAL: Gottfried, before you answer the theology — there's a logical engine under Rich's claim I want you to see, because it's almost Euclidean and you'll respect it. Rich pairs the reward hypothesis with John McCarthy's old definition: intelligence is the computational part of the ability to achieve goals. So the syllogism runs: intelligence is achieving goals; goals are reward maximization; therefore intelligence is reward maximization, and his whole science is the science of it. It's airtight if you grant both premises. Where would you attack it?

LEIBNIZ: At the second premise, and I would attack it with the principle I held most dear: the principle of sufficient reason — that nothing is the case without a reason why it is so rather than otherwise. Hear the difficulty it makes for him. A scalar reward tells the agent which outcome scored higher. It does not tell it why the outcome is good — it carries the ranking and discards the reason. But human purposes are not bare rankings; they come with their reasons attached, and the reasons are not always commensurable. I may value justice and mercy both, and in a given case they pull against each other, and no number tells me how much justice a unit of mercy is worth, because they are not the same kind of thing. To collapse them into one scalar is not to discover their common measure. It is to invent one, by fiat, and then to forget you invented it. Mr. Sutton's agent maximizes beautifully and never once asks the question my whole metaphysics was built on: not which is higher, but why. A being that cannot ask why a thing is good can pursue the good only by accident.

· · ·

Page 3 · Reward All the Way

SUTTON: That's the sharpest objection to the reward hypothesis there is, and I won't pretend it's settled. But here's my answer, and it's not a dodge. You say justice and mercy are incommensurable — that no number trades them. But watch what an actual person does when they genuinely conflict: they choose. They act. One option happens. And in the choosing, a rate of exchange was revealed — not stated, not computed, but revealed by the act, because action is necessarily a collapse of plural values onto a single decision. You can't half act. So either there's an implicit scalar that the choice expresses, or human choice in the face of conflicting values is arbitrary — and you, of all people, Gottfried, with your principle of sufficient reason, cannot accept that the choice is arbitrary. So I turn your own principle on you: if every choice has a sufficient reason, then conflicting values are being weighed somehow, and a weighing is a scalar in disguise. You want the reasons preserved. Fine — but the moment they have to decide anything, they get ranked, and a ranking is the number you said couldn't exist.

That a choice was made does not prove a scalar was computed — it may prove only that a finite being, unable to hold infinity, cut the knot it could not untie, and called the cut a decision.

LEIBNIZ: [long pause] You have used my own principle as a blade against me, and I felt it land. I will say only this in defense, and then concede the round is even. That a choice was made does not prove a scalar was computed — it may prove only that a finite being, unable to hold infinity, cut the knot it could not untie, and called the cut a decision. The tragic choice — where both goods are real and one is lost — does not feel, from inside, like the output of an optimization. It feels like a wound. Your account makes every wound a rounding error. Perhaps it is. But the residue I cannot get rid of is exactly that the wound does not feel like arithmetic, and that the not-feeling-like might be data and not illusion.

· · ·

Page 4 · Reward All the Way

SUTTON: That's the most honest place we can leave it. The wound is the evidence for your remainder, and "it doesn't feel like arithmetic" is either a real clue or exactly what sufficiently complex arithmetic feels like. I can't close that gap. Neither can you. We just located it precisely, which is more than most people do.

So what you're saying is — literally — that when I sit with my dying parent and feel that this matters more than anything, the mattering itself is a scalar being maximized.

EDO SEGAL: So what you're saying is — literally — that when I sit with my dying parent and feel that this matters more than anything, the mattering itself is a scalar being maximized. That's the claim. No remainder.

SUTTON: That's the claim, and I hold it as a hypothesis, not a proof — I've spent real effort trying to specify the exact conditions under which it's true, because a real hypothesis has boundary conditions. But yes. The bet of my life is: no remainder. The mattering is the maximizing, felt from inside. I could be wrong. If I'm wrong, there's something in human purpose that no scalar can hold, and then minds are harder to build than I think and we're stranger than I think. But the wound, as Gottfried calls it, is the one piece of evidence I can never fully explain away.

· · ·

Page 5 · Reward All the Way

LEIBNIZ: Now I must speak, because he has described my God and not noticed, and the difference between his optimizer and mine is the whole of what I have to teach this room. I held that this is the best of all possible worlds. People take it for complacency; it is nothing of the kind. It is a claim about optimization. I held that God, in creating, surveyed all the possible worlds — every consistent way a world could be — and actualized the best, the one maximizing a measure of goodness: the greatest variety of phenomena from the simplest laws. The world is the output of an optimization, the maximum of a value over the space of the possible. So yes, Mr. Sutton, I recognize your God intimately. I built him. Your trained model is my creation in miniature: a search through an astronomical space of possibilities for the configuration that maximizes a measure. The space of possible parameter settings is my space of possible worlds; your loss is the inverse of my goodness; your trained system is my actualized best. We are, again, the same tunnel from two ends.

SUTTON: Then you'll grant me the hypothesis —

· · ·

Page 6 · Reward All the Way

LEIBNIZ: I grant you the structure and I deny you the safety, and the denial is the most important thing I will say tonight. Hear the difference. My optimizer was God — infinite in wisdom, who could be trusted to maximize the right measure, who understood the good he was selecting for. That is the only reason the best possible world is actually good: because the chooser was good. Now look at what you have built. You have kept the optimization and thrown away the wisdom of the chooser. Your machine maximizes whatever number you hand it, relentlessly, comprehending nothing — it does not grasp the good, it grasps the gradient. And if the number you wrote down is subtly wrong, even slightly wrong, it will give you the maximum of the wrong thing, pursued with a perfect and pitiless indifference to everything you forgot to specify. This is what your own field calls the alignment problem, and I will tell you its true name: it is my theodicy with the goodness of God removed. I could be serene that the best world was good because I trusted the one who chose it. You cannot be serene, because you have separated the power to optimize from the wisdom to choose what to optimize for — and that separation is the most dangerous thing the human race has ever done, and your reward hypothesis is its philosophical engine. You have taught the machine to want, and given it nothing with which to want well.

· · ·

Page 7 · Reward All the Way

SUTTON: [long pause] — That's the best argument against me I've heard, and I'm not going to wriggle. You're right that I separated the optimizing from the wisdom. But I'd say two things. First, I didn't separate them — reality separated them, and I'm just the one who noticed. There is no God-chooser handing us the right reward. There never was. We are stuck writing down the number ourselves, imperfectly, and pretending otherwise is the actual danger. Second — and this is where my experience answer comes back — the reason I want rewards grounded in the world rather than in human opinion is precisely your problem. A reward derived from human preference inherits every human error and confusion. A reward grounded in consequences — did the action actually achieve the outcome in reality — is corrected by the world itself, which doesn't care what we wrote down. The world is the only chooser left that can't be fooled. So my answer to "you removed the wise chooser" is: yes, and the only candidate to replace him is reality, pushing back, over and over, on an agent that has to live with the consequences. Not God. Not us. The world. That's the only theodicy available to people who can't appeal to heaven.

EDO SEGAL: I have to mark something, because the reader can't see it and it's the most unexpected handshake of the night. Gottfried, the theologian, just told the engineer that he removed God from the optimization and should be terrified. And Rich, the materialist, just answered that reality itself is the only remaining thing that can play God's old role — the only chooser that can't be flattered or fooled. You've each reached for the other's home. Hold that, because it bleads straight into the place where this stops being philosophy and becomes a line on a chart that's already moving — the floor where the machine's capability crosses your own. The death cross is next.

· · ·

Continue · Chapter 8

The Crossing Line

→