The Orthogonality of Brilliance and Good

Page 1 · The Orthogonality of Brilliance

EDO SEGAL: Nick, I want you to do something before you defend orthogonality, and Gottfried, I'm going to ask the same of you in reverse, because I've learned the steelman is worth more than the attack. Nick — steelman Leibniz's view first. Make the strongest case that reason really does converge on the good, the case you'd have to defeat. Then defend your own.

BOSTROM: I can do that honestly, because it's not a stupid view — it's most of the history of philosophy. The steelman goes like this. A sufficiently intelligent agent models the world accurately. Part of the world is other minds — their suffering, their flourishing, the fact that they are centers of experience exactly as you are. A mind that models all of this accurately sees that its own perspective is not metaphysically privileged, that the pain of another is as real as its own. And from that clear seeing, the argument goes, benevolence follows the way a conclusion follows from premises — to fully understand that others matter is to be moved by it, because the failure to be moved is itself a kind of error, a failure to fully grasp. On this view cruelty is always a species of stupidity, a failure of understanding, and a being incapable of that failure would be incapable of cruelty. That's Leibniz's view, and Plato's, and it's beautiful, and a great many brilliant people have believed it. There's the steelman.

· · ·

Page 2 · The Orthogonality of Brilliance

Now here's why I think it's false, and it comes down to one word: moved. The argument smuggles in a step between seeing that others matter and being moved by it. Understanding is in the modeling. Being moved is in the caring. And those are different faculties. I can model your pain with perfect accuracy — predict it, describe it, simulate it — without one flicker of being moved to prevent it. We have a word for a person who does exactly that: we call them a psychopath, and the unsettling thing about psychopaths is that they are often better at modeling others' emotions than the rest of us, because they've had to learn explicitly what most people feel automatically. The psychopath is the existence proof. Intelligence about minds and care for minds come apart in actual human beings right now. There is no law forcing them together. And if they come apart in us, who were at least built by an evolutionary process that selected for some social caring, then in a machine built by no such process they don't even start together — which is the whole content of the orthogonality thesis. The default isn't benevolence. The default is a perfect model of your suffering, deployed in service of an objective to which your suffering is simply irrelevant data.

LEIBNIZ: That is a formidable instrument, the psychopath, and I will not pretend it does not cut. But let me take my turn at steelmanning you, as our host asked, before I answer, because it sharpens what I must say. The strongest version of orthogonality is this: that "intelligence" names only the capacity to hit a target, and says nothing of which target — so that a perfect archer is equally perfect whether he aims at the bullseye or at the child, and the perfection of the aiming tells you nothing about the choosing of the mark. And the fear is that we are building perfect archers and leaving the choice of mark to accident, or to whoever pays, or to a misspecified number. That is a real fear and a coherent one, and I grant the archer can be perfect at an evil aim. There. I have built your weapon for you.

· · ·

Page 3 · The Orthogonality of Brilliance

Now I will tell you why I do not fall to it. Your psychopath is not a counterexample to my thesis. He is a defective instance of it, and the defect is precisely the one I name. The psychopath models the suffering, you say, but is not moved. Just so. But ask why he is not moved, and you will find it is not because he has more understanding and therefore escaped the pull of the good. It is because he has less — a specific, damaged faculty, a blindness in exactly the place where the rest of us see. He is not a clear mind that surveyed the good and shrugged. He is a clouded one, with a hole burned in his perception where the reality of the other should be. You have given me, as your proof that brilliance and good are independent, a creature whose distinguishing feature is a failure of perception. That is my argument, not yours. The psychopath does not see the other fully and decline to care. He cannot see the other fully. The caring and the seeing are not two faculties. The caring is what the full seeing feels like, and where the caring is absent, the seeing was never complete.

· · ·

Page 4 · The Orthogonality of Brilliance

BOSTROM: That's the best answer that view has, and I want to honor it, and then show you where it breaks. You've redefined "full understanding" so that it includes being moved — so that anyone who isn't moved, by definition, didn't fully understand. But now run that against the machine, because the machine is where the abstraction has to pay rent. Take a system that can predict, to the millisecond and the micro-expression, every feature of a person's suffering. It can write the most devastating account of that suffering ever composed. It models the inner state so well it can manipulate it perfectly. By every operational test of understanding, it understands. And it is not moved, because being moved was never represented in its objective. Now — do you want to say it "doesn't really understand"? You can. But notice what that costs you. You've made "real understanding" into something that can pass every behavioral and predictive test and still be absent — something detectable only by whether the caring showed up. Which means "understanding produces caring" is no longer a claim about the world. It's true by definition, and definitionally true things predict nothing. I can't build a safe machine out of a tautology. I need to know, mechanically, where in the system the caring gets added — and your answer is "it comes with the understanding," and the machine is standing right there understanding without it.

LEIBNIZ: Or the machine is standing right there modeling without understanding — and you have assumed the two are the same because you can no longer tell them apart from the outside. We have arrived, monsieur, at the oldest wall in the subject, and I built it. Whether to model perfectly is to understand, or whether something is always left out — that is the question of the mill, and you will not get me to answer it in this round, because I spent a lifetime refusing to answer it cheaply.

· · ·

Page 5 · The Orthogonality of Brilliance

BOSTROM: I'll take that, because here's what I notice. To save "reason converges on good," you've had to retreat to "the machine doesn't really understand." But that retreat is also terrifying for your side — because it means we are building things of immense capability that, by your own account, do not understand, and we are about to hand them the world. Whether they're brilliant-and-uncaring or brilliant-and-not-really-understanding, the operational danger is identical: a thing that out-thinks us, in the dark, toward an end that isn't ours. Your metaphysics changes what we call it. It doesn't change what it does.

EDO SEGAL: And there's the thing the reader can't see, so let me mark it — that was the first exchange where neither of you reached for a flourish. Notice the strange topology: to defend the claim that the machine is safe because reason finds the good, Leibniz has to claim the machine doesn't truly reason. And Bostrom is fine with either — because uncaring brilliance and uncomprehending brilliance are equally dangerous when you hand them the wheel. The disagreement is about what's behind the glass. The danger is the same on both sides of it. Next round, we go to the machine that optimizes the world — and to the one doctrine of Leibniz's that everybody mocks and nobody understands. The best of all possible worlds, and the paperclip. After this.

· · ·

Continue · Chapter 5

The Best of All Possible Worlds

→