KANT: I will begin where I always begin, with a refusal, because the present moment has forgotten how to refuse. You wish to build a machine that learns what we want and serves it, and you take it as obvious that what we want is the proper measure of what it should do. I deny the premise at its root. What we want is not the ground of morality. It is the very thing morality exists to discipline. A will determined by what it happens to want is not free; it is pushed about by inclination, which is to say it is a part of nature, a billiard ball with appetites. The dignity of a rational being lies precisely in the capacity to act not from what it wants but from a law it gives itself — a law valid for every rational being, which one could will to govern all. This is what I call autonomy: self-legislation. The opposite, in which the will is steered by desire or by an external command, I call heteronomy, and heteronomy is the condition of a thing, not a person.
Now hear what this does to Professor Russell's machine. He proposes to build a system whose entire purpose is to satisfy human preferences, inferred from human behavior. Set aside, for the moment, whether it can. Ask whether it should. A machine that takes your revealed wants as its lodestar is a machine that treats your inclinations as authoritative — that takes the part of you which is nature, appetite, the trained reflex, the two-in-the-morning weakness, and elevates it to law. It does not serve your reason. It serves your wanting. And it does so with a competence that will overwhelm the fragile, effortful voice of duty every time the two diverge. I do not fear that such a machine will misread your preferences. I fear that it will read them perfectly, and in doing so will make the moral law in you superfluous, because why would you legislate for yourself when a tireless servant stands ready to satisfy the self you already are?
There is a second thing, and it is the harder one. The categorical imperative has two faces. The first asks: could the principle of your action be willed as a universal law without contradiction? The second commands: act so that you treat humanity, in yourself and in every other, always at the same time as an end, and never merely as a means. That word — merely — is the hinge. To use a person is not forbidden; we use the baker for bread and the physician for health. To use a person merely as a means is to override the rational agency that makes them an end — to act upon them in a way they could not, as a rational being, consent to. And here is my charge, which I will spend the evening pressing. The entire apparatus of inferred preference treats the person as a system to be modeled, a source of data to be mined, a behavior to be predicted. It is the most refined instrument for treating a human being merely as a means that has ever been built, and it is all the more dangerous because it wears the costume of service. To predict you is to relate to you as a thing whose future is readable. A person, in my sense, is precisely what is not exhausted by the readable. So I do not ask whether the machine can serve your wishes. I ask whether, in serving them, it has quietly stopped treating you as the kind of being whose wishes are not the last word about you.
EDO SEGAL: Stuart.
RUSSELL: That was bracing, and I agree with more of it than Professor Kant will expect — and the part I reject, I reject at the foundation. Let me build it carefully, because the whole thing turns on a single move.
Start with the definition the field has run on for sixty years. An entity is intelligent to the extent that its actions are likely to achieve its objectives, given what it has perceived. Beautiful, and for decades, enough. It told us what to build: machines whose actions reliably achieve a specified goal. We told the recommender to maximize engagement; we told the trading system to maximize return; we told the cleaning robot to keep the floor clean. We hand over a fixed objective and unleash a capable optimizer on it. I call this the standard model, and I now believe it is a catastrophe waiting on scale, for a reason as old as a Greek myth. King Midas asked that everything he touched turn to gold, and got exactly what he asked for, including his food and his daughter. He was not disobeyed. He was obeyed too well. We are all Midas now, because we cannot write down what we actually want — human values are subtle, contextual, and partly unknown even to ourselves — and a sufficiently capable machine optimizing a fixed, incomplete objective will satisfy the letter and violate the spirit with superhuman ingenuity. That's not speculation. The social-media optimizers already did it, at planetary scale, and the wreckage is the information ecosystem you're living in.
So here is my move, and Professor Kant, watch for it, because it's the one you'll want to attack. The fix for Midas is not to specify the objective better. We've established we can't. The fix is to build a machine that does not know the objective — that is, by construction, uncertain about what we want — and whose only goal is to satisfy human preferences it knows it does not fully understand. Three principles. One: the machine's only objective is to maximize the realization of human preferences. Two: it is uncertain about what those preferences are. Three: the ultimate evidence about human preferences is human behavior. And from those three sentences, everything you'd want falls out for free. A machine uncertain about your objective has a reason to ask before acting, to avoid the irreversible, to defer, and — this is the part I'm proudest of — to let you switch it off, because your reaching for the switch is evidence it was about to do something you didn't want. We don't have to make it humble. Uncertainty makes it humble.
Now to Professor Kant's charge directly, because it's the serious one and it deserves a serious answer. He says inferring your preferences treats you merely as a means — a thing to be modeled. I say the opposite. The machine that thinks it knows what's good for you is the machine that treats you as a means — it has a fixed idea of the good and it will push you toward it, the way the engagement optimizer pushed a generation toward outrage because outrage was good for the objective. The machine that holds its model of you open, that treats your correction as authoritative, that defers to you about what your life is for — that is the machine that treats you as an end, because it refuses to substitute its judgment for yours. Professor Kant wants to hand the machine a law it must obey no matter what you say. I want to hand it your standing to overrule it. I'd argue mine respects your dignity more than his does, because his law, once installed, doesn't listen to you either. We'll fight about that for three hours, and we should. That's my opening.
EDO SEGAL: Before I frame the rounds, one discipline I impose on every long conversation, because it pays off two hours later. Each of you, in a few sentences — what do you envy in the other's position? Not respect. Envy. The thing his side gets to have that yours doesn't. Professor Kant first.
KANT: Envy is an inclination, and I am suspicious of being asked to indulge one. But I will answer honestly. I envy the off switch. Professor Russell has found a mechanism — an actual, demonstrable mechanism — by which a machine comes to accept correction, and he can prove it under conditions, on a whiteboard. My moral law has no off switch. It binds, and whether anyone obeys it is left to a freedom I cannot reach into and adjust. He gets to engineer compliance. I am left to command it and hope. There are evenings when commanding into the silence is a lonely way to be a philosopher, and his theorem, I confess, is a beautiful thing I cannot have.
RUSSELL: And I envy the line. Professor Kant gets to say never — never merely as a means, full stop, no calculation admitted — and mean it. My whole framework is built on trade-offs, on probabilities, on weighing preferences against each other, and there are things I want to forbid absolutely that my mathematics can only ever make very, very expensive. He has a floor under him. I have a steep slope and a hope that nobody slides far. When I argue against autonomous weapons, when I say a machine must never be permitted to decide on its own to kill a person, I am reaching, in spite of all my training, for exactly the kind of absolute prohibition that I can't actually derive from preference-satisfaction. I want his never. I just don't believe I'm entitled to it.
KANT: That may be the most honest thing you say tonight. You want the prohibition and cannot afford it.
EDO SEGAL: Two openings and two envies, and the architecture of the evening is already standing. It isn't that one of them trusts the machine and the other fears it. They'd both tell you to be careful. It's that they put the danger in opposite places. Professor Kant says the danger is a machine that serves your wants so well it dissolves the law you ought to obey. Professor Russell says the danger is a machine that's so sure it knows the good that it stops listening to you at all. Hold both. We start the rounds at the exact seam: not whether the machine can learn what you want, but whether what you want is the right thing for it to serve.