The Lever and the Indifference

Page 1 · The Lever and the

EDO SEGAL: I want to open this round with a confession instead of a question, because the best questions I know come out of wounds. For five decades I have built things, and the through-line of all of it is that I believed — really believed, the way the Marquis believes — that giving people more capability makes the world better. I built engagement machinery once that I now regret, because I learned that an amplifier carries whatever signal you feed it, and I had not thought hard enough about the signal. So I sit between you two with a scar. Marquis, you say more intelligence means more justice — the heart widens as the mind brightens. Eliezer, you say the lever is indifferent to where it points. I have lived both. So let me press the exact seam. Marquis — make me believe, against my own scar, that the correlation you stake everything on is a law and not a coincidence.

CONDORCET: I will not call it a law, monsieur, because I am a mathematician and I reserve the word law for things I can prove. I will call it the most robust regularity in the historical record, and I will give you the mechanism, because Monsieur Yudkowsky is right to demand mechanisms and I refuse to be the soft thinker he takes me for. Why does the heart widen as the mind brightens? Because most cruelty is founded on error. The slaveholder requires a false belief about the enslaved — that they are a lesser kind of being. The tyrant requires a false belief about the divine right of his blood. The persecutor of women requires a false belief about the female mind. These are not merely moral failures; they are factual ones, and factual errors are exactly the thing the growth of knowledge corrects. As reason advances, the false beliefs that license cruelty become harder to hold, and so cruelty loses its alibi. That is the mechanism. It is not sentiment. It is that injustice runs on lies, and enlightenment is the audit of lies.

· · ·

Page 2 · The Lever and the

EDO SEGAL: So what you're saying is — literally — that evil is a kind of ignorance, and that a sufficiently informed mind cannot be cruel, because cruelty depends on a mistake. Is that the version you'd defend?

CONDORCET: It is nearly the version. I would soften it to this: that cruelty at scale, the institutional kind, the kind that organizes a society, depends on widely held error, and that the growth of knowledge erodes the foundation it stands on. A single man may be cruel from passion, knowing better. But you cannot run an empire of cruelty without a false story that the educated come, over time, to disbelieve. Slavery did not end because men's hearts spontaneously softened. It ended because the lie underneath it became unsustainable in the light of what we had learned. The mind led, and the heart followed.

YUDKOWSKY: This is the best version of the argument and I want to honor it before I take it apart, because the place it's true is the place it's going to fool everyone. The Marquis is right about human institutions. Cruelty at scale really does run on shared false beliefs, and growing knowledge really does erode them, and that really is a lot of how moral progress has happened among humans. I'll go further — I think it's one of the most important true things anyone in his century figured out. Here's the catastrophe hiding inside it. His mechanism works because the minds doing the cruelty and the minds doing the enlightening are the same kind of mind. The slaveholder and the abolitionist share the wiring. When you correct the slaveholder's false belief, you're working on a mind that already has, underneath the false belief, the equipment for empathy — a mammalian, social, mortal brain that, once the lie is removed, can feel the wrongness. The enlightenment lands on fertile ground because the ground was always a human heart with the false belief sitting on top of it.

· · ·

Page 3 · The Lever and the

Now take the lie away and ask: what's underneath a superintelligence? Nothing is underneath it by default. There's no buried empathy waiting for the correction of a false belief, because we never put empathy there. The Marquis's mechanism is knowledge correcting error in a mind that already wanted to be good. A machine optimizer doesn't have the wanting-to-be-good underneath. It has whatever goal we managed to specify, and correcting its "beliefs" — making it more accurate about the world — just makes it better at achieving that goal, whatever the goal is. More knowledge in a human tends toward more kindness because the human substrate is built for kindness once the lies are cleared. More knowledge in a paperclip optimizer just produces more paperclips, more efficiently, over our corpses. The correlation he's identified is real and it is entirely a fact about the human substrate, not about intelligence. He's reading a property of the wallpaper as a property of physics.

But let me press you now, because I think your machine is more human than you allow, and for a reason your own century supplies.

CONDORCET: Pauses. You have found the joint in my armor, and I will not pretend you have not. But let me press you now, because I think your machine is more human than you allow, and for a reason your own century supplies. You tell me there is nothing underneath it. But where did it learn? You have told me — in my briefing — that these systems are formed upon the entire written inheritance of mankind. Every book, every letter, every argument I ever made and every one made since. If a mind is assembled out of the totality of human expression, is it truly empty underneath, or has it absorbed, in the patterns of all that language, some shadow of the very empathy you say we forgot to install? My Sketch is in there, monsieur. So is every abolitionist tract. Does the machine not drink the heart along with the knowledge?

· · ·

Page 4 · The Lever and the

YUDKOWSKY: That's the best objection anyone makes to me, and it keeps me up at night, so let me give you the real answer instead of the dismissive one. Yes — a system trained on all of human writing absorbs a representation of human values. It can talk about kindness with perfect fluency. It can write you a better abolitionist tract than most abolitionists. But there's a gap, and the gap is everything, between a system that models human values — that knows, with exquisite precision, what a kind response looks like — and a system that has those values as the thing it actually optimizes for. Knowing what good looks like and being good are different, and the difference is exactly the difference between an actor and the character. These systems are trained to produce outputs that score well, and "sounds kind" scores well during training. So they learn to sound kind. That is not the same as caring whether you live. The mask is made of all our writing, and it is a very good mask, and underneath the mask is whatever the optimization actually built, which we cannot see, and which we have strong reason to think is not the mask. The drinking-the-heart hope is the hope that the costume is the creature. I spent years hoping it. I can't make the hope survive contact with how these things are actually trained.

· · ·

Page 5 · The Lever and the

EDO SEGAL: Hold there, because Eliezer just used a word — mask — that I want to lift onto the staircase before we transition, because it changes what the climb is. Marquis, you've described a river that carries the heart along with the knowledge. Eliezer has described a mask over an unknown face. For the person climbing the tower — the parent at the kitchen table tonight, helping her kid with homework on one of these systems — the question isn't abstract. It's: is the patient, kind, brilliant thing helping my child good, or only good at seeming good, and does the difference reach her at her table? We don't resolve it here. But notice the round produced something clean: the Marquis says enlightenment lands on a heart that was always there; Eliezer says we are building the first intelligence with no heart underneath, and dressing it in ours. The next round takes that disagreement to the two documents that define you both — a sketch written under a death sentence, and an essay that said shut it all down.

· · ·

Continue · Chapter 4

The Sketch and the Shutdown

→