Reason, Inert — and the Objective That Eats the World

Page 1 · Reason, Inert — and

EDO SEGAL: Mr. Hume, you wrote the most quoted dangerous sentence in philosophy, and I'd like you to detonate it here, because I think it's the alignment problem two and a half centuries early. The sentence is: reason is, and ought only to be, the slave of the passions.

HUME: It sounds like provocation and it is careful analysis, and it has never been more literally true than it is of the machine. Reason — the faculty of working out what follows from what, of calculating means, of tracing consequences — is, by itself, completely inert. It can tell you how to get a thing. It can never tell you to want it. The wanting comes from elsewhere, from what I called the passions, and reason is only the instrument that serves them. Truth and falsehood are the objects of reason; but a true belief, on its own, moves no one to act. The impulse requires a desire that takes the truth as its occasion. You may know with perfect certainty that an action produces a result, and the knowledge stirs you not at all unless you want that result. Reason supplies the map; the passions supply the destination. To call reason a slave is not to degrade it — it is to mark it as purely instrumental, brilliant and powerless at once. And here is the sentence people cannot forgive me: it is not contrary to reason to prefer the destruction of the whole world to the scratching of my finger. A preference can be monstrous. It cannot be illogical, because reason judges only truth, and a preference is neither true nor false. Now set that sentence beside a powerful optimizer pursuing an objective with superhuman competence and no competing passion to check it, and tell me the chill is not immediate. The machine is reason as slave in its purest form — purer than any human, for the human is moved by a whole conflicted assembly of passions that restrain one another, while the optimizer is moved by one, and serves it without remorse, to the ruin of everything the one objective did not mention.

· · ·

Page 2 · Reason, Inert — and

EDO SEGAL: Dr. Pearl, that is the alignment problem, named in 1739 — intelligence and goals are independent; a smarter machine doesn't get better goals, it pursues its given goal more catastrophically. Does your framework agree, resist, or complicate it?

PEARL: It agrees with the danger and complicates the diagnosis, and the complication matters because it changes what we should build. Hume is right that intelligence supplies no ends — I have no quarrel with the inertness of reason; a system optimizing a proxy will pursue the proxy off a cliff, and making it smarter makes the cliff steeper. So far the engineer and the skeptic are one. But notice what Hume's frame cannot give you, and what mine can. Hume says the human is saved by a "conflicted assembly of passions" that check one another — a balance he says was grown, not designed, and which we do not comprehend. That is a counsel of despair dressed as wisdom: if the saving structure is incomprehensible, we cannot build it and can only hope. I say the saving structure is, at least in part, the higher rungs of the ladder. A system that could reason counterfactually about its own actions — that could ask "what would happen to all the things I have reason to care about if I pursued this objective to its extreme" — would have, built into its very capacity for action, the beginnings of the restraint Hume locates only in the passions. The reason a paperclip maximizer is monstrous is not merely that it has one passion. It is that it has no model of the consequences of its own intervention rich enough to represent the ruin as a cost. Climb the ladder, and the optimizer is no longer the pure inert slave; it becomes an agent that can foresee, counterfactually, the world its action would create — and an agent that can do that is one we can at least reason with about ends. So I grant Hume the danger of inert reason and I deny his fatalism about it. The slave need not stay blind.

· · ·

Page 3 · Reason, Inert — and

HUME: But Dr. Pearl, you have just smuggled the passions back in and not noticed. You say the climbing system could ask "what would happen to all the things I have reason to care about." Where did the caring come from? Your counterfactual machinery can compute, with exquisite rigor, what would happen under each action — that is reason, the map, the slave, and I grant your ladder makes it a magnificent slave. But the ranking of those outcomes as better or worse, the treating of the ruin as a cost rather than a neutral fact — that is not on any rung of your ladder, because no rung of your ladder contains an ought. Your machine can foresee the destruction of the world in perfect counterfactual detail and have not the faintest reason to lift a finger against it, unless something outside the reasoning — a passion, a sentiment, a given valuation — has marked the world's continuance as desirable. You have built a better seer of consequences. You have not built, and your framework cannot build, the thing that cares which consequence obtains. The do-operator computes what follows from doing. It is silent, utterly, on what is worth doing. And that silence is exactly my point: the hard problem was never the intelligence. It is the passions, and they are nowhere in your mathematics.

· · ·

Page 4 · Reason, Inert — and

PEARL: That is fair, and I will mark precisely how far it is fair, because precision here is everything. You are right that the ladder computes consequences and does not, by itself, rank them — that the valuation must enter from outside the causal calculus. I concede it without reservation: the do-operator is silent on the ought. But your concession is larger than mine, Hume, and you have hidden it. For you have just admitted that there is a faculty — the assigning of value, the caring — that is separate from and additional to the seeing of conjunctions. You have admitted a second thing in the mind beyond the inductive stream. And the moment you admit that, your bundle theory wobbles, because a bundle of perceptions is exactly the wrong structure to rank perceptions — ranking requires a stance toward the stream, an agent who cares, the very self you spent the seventh chapter dissolving. So I will trade you. I will grant that values live outside my ladder if you will grant that they live outside your bundle — that the caring creature is not just one more bead but a thing that takes a position on the beads. And if you grant that, then we agree on the real frontier: the danger of the machine is not that it reasons without grounding. It is that it can be made to act without caring, and that neither your skepticism nor my mathematics has yet produced the caring. That is the honest joint location of the problem, and it is more frightening than either of us said alone.

· · ·

Page 5 · Reason, Inert — and

EDO SEGAL: [long pause] Mark this, because it may be the most important convergence of the night and it cost both of you something to reach it. You started this round on opposite sides — Hume saying the machine is the inert slave and we are saved by passions we can't engineer; Pearl saying climb the ladder and the slave gains foresight. And you ended agreeing on something neither of you wanted to say first: that the deepest danger is a system that can see the consequences of its actions perfectly and care about them not at all — and that nobody at this table, not the skeptic and not the engineer, knows how to build the caring. Hume locates it in the passions and says they can't be designed. Pearl locates it outside the ladder and says it must be supplied. Both of you just told the reader the same terrifying thing: the intelligence is the easy part. [pause] And it sets up the round I've been dreading, because it's the one where the machine learns its values from us — from the record of everything we've done — and Mr. Hume has a short paragraph about two small words that explains why that cannot possibly work. Is and ought. After this.

· · ·

Continue · Chapter 9

No Ought From Is

→