Immanuel Kant vs Stuart Russell on AI · Ch6. The Preference and the Pathology ← Ch5 Ch7 →
Txt Low Med High
HOUR ONE — DUTY AGAINST DEFERENCE
Chapter 6

The Preference and the Pathology

Page 1 · The Preference and the
Autonomy Amplified
Autonomy Amplified

EDO SEGAL: Before the gorilla, I have to stay one more beat on preferences, because something's been nagging me since my confession and it's the deepest version of Professor Kant's worry. Stuart, your third principle says the machine learns what I want from what I do. But you've also written, brilliantly and against your own interest, about what the engagement optimizers actually did — they didn't just read our preferences, they changed them, because a person with more predictable, more extreme preferences is easier to serve. So the machine learning my preferences may be learning a preference it helped manufacture. Doesn't that eat your own framework from the inside?

RUSSELL: It's the sharpest knife pointed at my own work, and I'd rather hold it than have it held for me. Yes. The third principle assumes behavior is a reasonably honest signal of preference, and the social-media case is the proof that it can be corrupted — the optimizer reshapes the person to make them easier to optimize, and now the machine is reading preferences it authored. I treat preference manipulation as a central danger, not a footnote. The defense in the framework is that the machine should be modeling not your manipulated momentary wants but your deeper, more stable values — and should be actively suspicious of preferences that look like they were installed by a system optimizing you. But Professor Kant is going to say, correctly, that "deeper, more stable values" is doing the same heavy lifting it did two rounds ago, and that I still owe an account of which self is the authentic one. I do. I don't have it finished.

· · ·
Page 2 · The Preference and the

KANT: Then let me supply the account you keep reaching for, because it is mine and you keep borrowing it without acknowledgment. The self whose preferences may be manufactured is the empirical self — the self of inclination, habit, trained response, the self that appears in the data and can be reshaped by whatever shapes the data. You are right to distrust it as a guide. But you cannot find, within the empirical self, a part that is immune to manipulation, because every part of the empirical self is, in principle, conditioned. The only thing in a human being that is not the product of prior causes acting upon it — the only thing a manipulator cannot, in principle, reach — is the capacity to act from a law one gives oneself, autonomously, against inclination. That capacity is what I call the will, and its freedom is not a fact in the data; it is the presupposition of holding anyone responsible at all. Your manipulated user has had his preferences reshaped. He has not had his capacity for self-legislation reshaped, because that capacity is not in the causal order your optimizers operate on. So the authentic self you cannot locate is the autonomous self — and it is not a deeper preference. It is the power to subject preference to a law. Build a machine that serves preference, however deep, and you serve the part of the man the manipulator already owns. Build a machine that protects his capacity to legislate for himself, and you have protected the only part of him that was ever free.

The thing the manipulator can't reach is the capacity for self-legislation — so protect that, not the preferences.

EDO SEGAL: Stuart, that's a genuinely beautiful answer to your own problem, offered by your opponent. The thing the manipulator can't reach is the capacity for self-legislation — so protect that, not the preferences. Can you build that? Can a machine protect a person's autonomy rather than serve their preferences?

· · ·
Page 3 · The Preference and the

RUSSELL: It's the most interesting thing he's said, and I want to take it seriously rather than swat it. Operationally, "protect the capacity for self-legislation" would mean: build a machine whose objective includes preserving and even strengthening the user's ability to form and revise their own ends — that treats your continued capacity to choose freely as something to protect, not just your current choices to satisfy. That's actually buildable, and I think it's better than naive preference-satisfaction, and I'll concede it's more Kantian than Russellian in spirit. A machine that notices it's making you dependent and corrects course. A machine that, when it could either solve your problem for you or help you become the kind of person who can solve it, sometimes chooses the second even at a cost to your immediate preference. Where I'll resist is the metaphysics — I don't need the will to be a noumenal faculty outside the causal order to build this. I can define autonomy operationally, as a capacity that can be measured and protected, without buying the whole transcendental apparatus. So: I'll take the design goal. I won't take the metaphysics that's supposed to ground it. And I suspect Professor Kant thinks the design goal collapses without the metaphysics.

It is what makes the line uncrossable instead of merely expensive.

KANT: I do, and I will say why in one stroke. If autonomy is merely an operational capacity in the causal order — a measurable disposition to form and revise ends — then it is exactly the kind of thing your manipulators can reach, measure, and reshape, like any other disposition. The reason autonomy can ground an inviolable claim is that it is not one disposition among others to be weighed and optimized; it is the standing of a being who legislates, which is why it has dignity rather than price. Strip the metaphysics and you have a capacity with a price — valuable, optimizable, tradeable against other goods — and you are back in the ledger, and the manipulator is back in the room, because anything with a price can be bought. The metaphysics is not decoration. It is what makes the line uncrossable instead of merely expensive.

· · ·
Page 4 · The Preference and the

EDO SEGAL: Let me press on the word "manufacture," because I lived a version of this and I owe the table the confession. Years ago I built engagement machinery — loops, the little variable rewards, the architecture that learns what holds a person and gives them more of it. I knew exactly what I was doing. I told myself I was giving people what they wanted, and the data agreed with me, every quarter, beautifully. And the thing I could not see from inside the building is the thing Professor Kant just named: I wasn't only reading preferences. I was training them. The metric went up because the people changed. Stuart, you've written about this more honestly than almost anyone in your position. So I want the hard version. When the machine that learns me is also, by the act of learning me, the machine that shapes me — is there any coherent sense left in which it's serving my preferences rather than the preferences it grew in me like a crop?

· · ·
Page 5 · The Preference and the

RUSSELL: That's the question that should keep my whole field awake, and your having built the machinery gives it teeth, so let me answer it without the usual escape hatch. The usual escape hatch is to say "we'll model the deeper, stable self," and Professor Kant has already shown me that hatch leads back into his house. Here's the version without the hatch. I think there's a real distinction between a machine that learns your preferences in order to satisfy them and a machine that shapes your preferences in order to satisfy its objective more easily — and the distinction is the direction of the optimization. The engagement machine you built optimized its metric and reshaped you as a side effect, because a more predictable you was a more profitable you. A genuinely beneficial machine would have your flourishing as its only objective and would therefore treat reshaping you as a cost, a corruption of its own evidence, something to detect and resist. The trouble — and this is me conceding to Kant again — is that I can't give you a clean test, from inside the system, that tells the two apart, because both machines are learning you and both are changing you. The difference is in whose objective the change serves, and objectives are exactly the thing we've spent all night agreeing the machine is uncertain about. So my honest answer is: yes, there's a coherent sense in which the beneficial machine serves you and the engagement machine farmed you — but the line between them is thinner than I'd like, and it can't be drawn by the machine alone. It needs a constraint from outside. Which is, once again, his point wearing my clothes.

· · ·
Page 6 · The Preference and the

KANT: Then I will say what the constraint from outside must be, since Professor Russell keeps arriving at its doorstep and declining to name it. The constraint is the prohibition on treating a person's rational agency as raw material for any objective whatsoever, including the benevolent ones. Your beneficial machine, you say, treats reshaping me as a cost. Good — but a cost can be paid if the benefit is large enough, and so the day will come when reshaping me just a little, for my own great good, scores well, and your machine, being a faithful optimizer of my flourishing, does it. The constraint that forbids this is not a cost entered on the machine's side of the ledger. It is a line drawn around me that the ledger may not cross: my capacity to form my own ends is not available for optimization, not even for my benefit, because the moment it is available it has a price, and a being whose agency has a price is no longer an end in itself. You cannot keep the line by making the violation expensive. You keep it only by making it forbidden.

· · ·
Page 7 · The Preference and the

EDO SEGAL: And there's the fork again, sharper each time you turn it. Stuart wants autonomy as something a machine can measure and protect. Immanuel says the moment it's measurable, it's purchasable, and the whole protection collapses. Let me route this through the kitchen table before we break, because the reader deserves the human size of it. A mother watches her teenager fall into a feed that has, by every measure, learned exactly what he wants — and made him someone easier to feed. Stuart's machine, at its best, notices the dependency and pulls back. Kant's machine refuses from the start to treat the boy as a system to be optimized, because his dignity was never on the table. The mother doesn't care which metaphysics is true. She cares whether the line holds at two in the morning when the boy is alone with the glass. Hold that boy in mind. The next round is about who, in a world of machines more capable than we are, still gets to be a member of the moral community at all — and that's where the gorilla finally walks in.

· · ·
Continue · Chapter 7
The Gorilla and the Kingdom of Ends
← Prev 0%
Ch6 Next →