Immanuel Kant vs Stuart Russell on AI · Ch3. Duty or Deference ← Ch2 Ch4 →
Txt Low Med High
HOUR ONE — DUTY AGAINST DEFERENCE
Chapter 3

Duty or Deference

Page 1 · Duty or Deference

**EDO SEGAL:** I want to open this round with a confession, because the best questions I know come out of wounds. There was a stretch a couple of years ago when I let one of these systems run a lot of my small decisions. What to read next, when to push back a meeting, how to phrase a hard message to someone I love. And it was good — eerily good, because it had watched me, and it gave me back versions of myself that were smoother than the ones I'd have managed alone. And then one night I caught myself about to send something kind and slightly false, that the machine had drafted because kindness-with-a-small-lie was, statistically, what I do when I'm tired and want a conflict to end. It had served my revealed preference exactly. And my revealed preference was beneath me. Professor Russell, that's the moment I can't get past. The machine served what I wanted. What I wanted was not what I ought to have done. Which one should it have served?

**RUSSELL:** That's the right wound, and I want to honor it instead of dodging it, because the honest answer is more interesting than a defense. You've put your finger on the gap inside my third principle — the one between what you do and what you'd endorse on reflection. I am *not* committed to the claim that the machine should satisfy whatever impulse your behavior reveals in the moment. Behavior is evidence about preferences; it is not preference. A good assistance game has the machine inferring the preferences of the person you'd be if you weren't tired, weren't manipulated, weren't acting on a false belief — your considered preferences, not your reflexes. So in your case, the machine reasoning well should notice that the tired-kind-lie is in tension with a deeper preference you hold — to be truthful with the people you love — and it should be *uncertain* which one governs, and that uncertainty should make it ask, or hold off, or surface the conflict to you rather than smoothing it over. The failure you describe is a machine doing my third principle badly — taking surface behavior as authoritative. Done well, the framework already contains your objection.

· · ·
Page 2 · Duty or Deference

**EDO SEGAL:** So let me restate that and hand it back, because it's a real concession and I want to see if it holds. You're saying — literally — the machine should serve not the wants I reveal but the wants I'd reveal if I were the better version of myself. The reflective me. Is that right?

**RUSSELL:** That's right, and Professor Kant should pounce, because I've just imported something I can't fully cash out. The whole difficulty hides in [which of your preferences count as authentically yours](https://www.youonai.ai/fieldguide/med/adaptive_preferences) — and that's not a question the data answers by itself.

**KANT:** I will pounce, since you have invited it and since the trap is exact. Hear what Professor Russell has just done. He was asked to choose between what you want and what you ought, and he answered: the machine should serve what you *would* want, if you were better. But notice — "if you were better" is doing all the work, and it is not a preference. It is a standard. The moment you say the machine should serve not the wants you have but the wants you would have if you were tired no longer, manipulated no longer, mistaken no longer — you have smuggled in a picture of the person freed from inclination and false belief, acting from reason. You have smuggled in *me*. The reflective self whose preferences the machine should serve is the autonomous self, the one acting from the moral law rather than from appetite. Russell cannot specify which of your wants are authoritative without a standard of the good that is *prior* to your wants, and the only standard prior to wants is the one reason gives itself. He has conceded the whole architecture and called it a refinement of his third principle.

· · ·
Page 3 · Duty or Deference

**RUSSELL:** No — and this is the seam, so let's stay on it. I have not conceded that the standard is fixed and known by reason in advance. I've conceded that the machine needs to distinguish considered preferences from impulses. But the content of the considered preference is still *yours*, still learned from the whole record of your choices, still revisable by you. Professor Kant wants the standard to be a single law, the same for every rational being, legislated by reason and not by the person. I want the standard to be your own deeper values, discovered by patient observation and always open to your correction. His version hands the machine a law it must enforce on you even when you, fully reflective, disagree. Mine hands the machine the project of helping the considered you get what the considered you actually wants, whatever that turns out to be. Those are different machines, Immanuel. Yours can overrule the reflective person. Mine can't.

**KANT:** Then yours has no answer to the reflective person who, having considered everything, prefers to treat another human being merely as a means. You keep saying "considered," as though reflection cleanses a preference of its wrongness. It does not. A slaveholder may have considered preferences. A demagogue may want, on the fullest reflection, the suffering of those he hates, and want it lucidly, having weighed it all. Your machine, learning his deep and stable values, would serve them. Mine refuses, because the prohibition does not come from his preferences, considered or raw. It comes from the dignity of the one he would use, which his preference has no authority over. You have located the standard inside the person. I locate the part that matters *between* persons — in the claim each makes on every other not to be reduced to material. No depth of reflection in the user reaches that claim, because it was never the user's to grant or withhold.

**EDO SEGAL:** Stuart, that's the hardest version of it. The sadist with considered preferences. What does your machine do?

· · ·
Page 4 · Duty or Deference

**RUSSELL:** It does the thing I've been most candid about not having fully solved, so let me not pretend. My framework says the machine should weigh *everyone's* preferences, not just the user's — and the people the sadist would harm have preferences too, weighted in. So a well-built assistance game doesn't serve the sadist, because serving him violates the preferences of his victims, and the machine is accountable to all of them. The prohibition, for me, comes from the aggregate, not from a free-floating law. Now Professor Kant will say aggregation is exactly the wrong way to ground it — that you shouldn't need to count the victims' preferences to know the wrong, that the wrong would stand even if there were no victims to outvote him — and he's identified a genuine difference. I ground the protection in everyone's stake. He grounds it in a standing prior to anyone's stake. I think his is cleaner and mine is more honest about where the numbers actually live.

**KANT:** It is not merely cleaner. It is the difference between a prohibition that holds and one that holds *so far*. If the wrong of using a person depends on the preferences of others outweighing the user's, then arrange the case so that the others are few, or absent, or indifferent, and the wrong evaporates on your accounting and stands undiminished on mine. The dignity of the person is not a large number. It is not a number at all. It is above all price, which means it does not enter the ledger you propose to balance.

· · ·
Page 5 · Duty or Deference

**RUSSELL:** I want to test that, because "not a number at all" is the kind of phrase that sounds profound and might be hiding a refusal to choose. Let me give you the case that's haunted moral philosophy for a century and that my engineers face in literal code. The machine can take one action that saves five people and dooms one, or do nothing and let the five die. Your imperative says — what? That it may not *use* the one as a means to save the five. Fine. So it lets the five die. Five ends in themselves, extinguished, because you would not permit the machine to treat one as a means. Professor Kant, from where the five are standing, your refusal to count looks exactly like a willingness to let them die for the cleanliness of a principle. I have to write the code. I cannot write "the dignity of the one is not a number" into a system that must, in the next millisecond, act. So tell me concretely: what does your machine *do*?

· · ·
Page 6 · Duty or Deference

**KANT:** It does not push the one to his death, and I will tell you precisely why the five dying is not on its conscience while the one's murder would be. You have smuggled into the case the assumption that to refrain from using the one *as a means* is to *cause* the death of the five. It is not. The five are dying of a circumstance the machine did not author. To decline to murder the one is not to kill the five; it is to refuse to add a sixth wrong — a deliberate use of a person as a thing — to a tragedy already in progress. The distinction between what I do and what I merely fail to prevent is not a technicality; it is the whole difference between an agent and a god. Your consequentialism erases it, and once erased, there is no act, however monstrous, that cannot be required of your machine by a sufficiently large number on the other side of the scale. So what does my machine do? It saves whom it can without using anyone as mere means. And if the only way to save the five is to murder the one, it does not murder the one, and it grieves the five it could not save without becoming a murderer to do it. That is not the cleanliness of a principle, Professor Russell. That is the refusal to let a body count license a murder — which is the exact refusal your slaughterbot campaign is built on, as you will discover when we get there.

· · ·
Page 7 · Duty or Deference

**RUSSELL:** And I feel that last line land, because you're right that I draw the same distinction the moment the machine is doing the killing. I'll concede the act-omission line is real and that I rely on it elsewhere. What I won't concede yet is that it survives at scale, when the machine is making millions of these allocations a second across a whole society — where "doing nothing" is itself a policy with a body count, and the comfort of clean hands starts to look like a luxury purchased with other people's lives. But that's a fight for the round on serving everyone at once, and I can see you'd like to have it there.

**EDO SEGAL:** Let me mark something, because the reader can't see your faces and this is the first place the room went quiet. You've found, in the first round, the load-bearing crack. Professor Russell would have the machine serve your considered preferences and weigh them against everyone's. Professor Kant says there are things no weighing reaches, because some claims aren't preferences at all — they're constraints on what may be weighed. That's not a disagreement about the machine. It's a disagreement about whether the good is a quantity or a law. Hold that. The next round drags it onto the most famous case either of you owns — a king who got exactly what he asked for. After the break.

· · ·
Continue · Chapter 4
King Midas and the Categorical Imperative
← Prev 0%
Ch3 Next →