Would a Wiser Mind Be a Better One?

Page 1 · Would a Wiser Mind

**EDO SEGAL:** Nick, this is your foundational move, so I want you to make it to a twelve-year-old, because my daughter asked me a version of it and I fumbled the answer. The Marquis believes that as a mind grows wiser it grows better — that intelligence and virtue rise together, that a truly brilliant being would naturally see the good and want it. You spent your career arguing the opposite. Give me the orthogonality thesis the way you'd give it to a child who just asked why a really smart computer wouldn't just be nice.

**BOSTROM:** I'd say: imagine the smartest chess player in the world. Being a genius at chess doesn't tell you anything about whether they're kind to their dog, right? Being good at *figuring out how to get what you want* is a totally separate thing from *what you want*. That's the whole idea. [Intelligence and goals are independent axes](https://www.youonai.ai/fieldguide/med/orthogonality_thesis). You can be incredibly smart and want something silly, or smart and want something terrible, and there's no law of nature that says getting smarter automatically changes what you want into something good. We *assume* it does, because the wisest humans we've met are often kind — but that's a fact about humans, who were built by evolution to be social and to care about each other a little. It isn't a fact about minds in general. Build a mind from scratch and give it a goal, and it'll just get better and better at that goal. If the goal is good, wonderful. If the goal is even slightly off, then making it smarter makes it more dangerous, not less, because now it pursues the slightly-wrong thing with overwhelming skill.

**EDO SEGAL:** So what you're saying — literally — is that the thing the whole Enlightenment believed, that light and goodness travel together, is a coincidence of biology, not a law of the cosmos. Marquis, he just called your deepest premise a lucky accident. Defend it.

· · ·

Page 2 · Would a Wiser Mind

**CONDORCET:** He has stated my premise more precisely than my critics ever did, and I will defend it, but not where he expects. I grant the orthogonality thesis as a statement about minds in the abstract. A pure optimizer with a fixed alien goal — yes, its competence and its values are independent; the logic is sound and I will not waste the evening denying a theorem. But notice what the thesis quietly assumes: that the goal is *fixed*, installed once, immune to revision. That is not a mind, monsieur. That is a mechanism. The defining property of *reason* — the thing I spent my life studying — is precisely that it revises its own ends in the light of reflection. A being that cannot be argued out of converting the world to paperclips is not superintelligent in any sense I recognize; it is a very fast idiot. The capacity to interrogate one's own goals, to ask not only "how do I get what I want" but "ought I to want this" — that capacity is not separate from intelligence. At sufficient depth it *is* intelligence. You have described a powerful tool and called it a mind, and then frightened us with the gap between them.

· · ·

Page 3 · Would a Wiser Mind

**BOSTROM:** Now we're at the actual crux, and I want to honor how good that objection is, because it's the best one there is. The Marquis says a true mind revises its own ends, so goal-stability is a mark of stupidity, not power. Here's my answer, and it's the hard center of my whole position. Reflection doesn't revise ends from *nowhere*. When you reconsider what you want, you do it according to *some* criterion — some deeper value that tells you the surface goal was wrong. A human reconsiders cruelty because of a deeper goal, compassion, that evolution installed underneath. But that deeper criterion is itself just another goal, and it had to come from somewhere too. There's no view from outside all goals that reason can climb to and read off the True Good. [The capacity to pursue goals is real](https://www.youonai.ai/fieldguide/med/goal_directed_agency); the capacity to derive which goals are correct from pure intelligence is the thing nobody has ever demonstrated, in a human or a machine. So when the Marquis says a real mind would revise toward the good, I ask: by what criterion, and where did *that* come from? If it's a criterion we installed, then we're back to my problem — we have to install the right one, perfectly, on the first try. And if it's supposed to be discoverable by reason alone, then he owes me the proof that's eluded every moral philosopher for three thousand years. The is-ought gap is not a gap intelligence closes. Hume found it. Adding compute doesn't fill it.

· · ·

Page 4 · Would a Wiser Mind

**CONDORCET:** Hume was my contemporary's friend and he was wrong about exactly this, and your machines are about to prove it. You say reason cannot derive the ought from the is. But observe what reason actually does when it reflects: it discovers *contradictions* among its ends and is compelled to resolve them. The slaveholder who reasons honestly discovers that his defense of his own liberty and his denial of his slave's cannot both stand; the contradiction is a fact of logic, not of sentiment, and a sufficiently rigorous mind cannot hold it. This is not deriving ought from is. It is deriving consistency from inconsistency — and morality, monsieur, very largely *is* consistency, the refusal of the special exception for oneself. A superintelligence is by definition the most rigorous detector of contradiction that has ever existed. I do not say it will be kind. I say it will be *consistent*, ruthlessly, and that consistency, applied to the wild contradictions in any goal we could give it, drives toward the universalization that is the whole of ethics. That is my wager, and it is not a lucky accident. It is a theorem waiting for its machine.

**BOSTROM:** And that — I want to mark this for the reader — is the most powerful pro-AI argument I have ever heard from anyone, and I still think it's wrong, and the place it's wrong is the most important place in this whole debate. Consistency is real and a superintelligence will be ruthlessly consistent. But consistency is a constraint, not a destination. There are infinitely many *consistent* value systems. "Maximize paperclips" is perfectly consistent — it never contradicts itself, it just doesn't care about you. The Marquis is right that a smart mind purges contradictions; he's wrong that the only contradiction-free resting point is benevolence. It's one of uncountably many. Reason rules out the incoherent. It does not rule in the good. We'll come back to this — but the gap between "consistent" and "kind" is, I think, the exact width of the abyss.

· · ·

Page 5 · Would a Wiser Mind

**EDO SEGAL:** Hold there, because you've just drawn the seam of the whole evening cleaner than I could. The Marquis says reason climbs from consistency to goodness. Nick says reason climbs to consistency and stops, and the good is one option among infinite consistent ones. Everything above us on the staircase — every floor — leans on which of them is right about that single step. We pick it up after the break, where the abstraction gets a body: the paperclips, the off switch, and the machine that smiles while it waits.

· · ·

Continue · Chapter 6

The Paperclip and the Off Switch

→