EDO SEGAL: I want to open this round with a confession, because the best questions I know come out of wounds. I started in Assembler. I was raised by the machine code, and for fifty years every act of building meant translation — you compressed your intention into the machine's grammar and paid a tax on every conversion. In the winter the machines crossed over, I watched that tax go to zero. I stood in a room with twenty engineers and watched each of them become capable of more than all of them together, because for the first time the machine met them in their language — mess, half-finished sentences, implication and all. Gottfried, this is your word made flesh: calculemus. They said the thing they wanted, and the disagreement between intention and artifact resolved itself by computation. So tell me — when you said let us calculate, is this what you meant? Or is this the thing you'd point at and say, no, you've misunderstood the entire program?
LEIBNIZ: Both, and the both is the whole of it. What you describe — the collapse of the tax, the meeting of the engineer in her own tongue — is more than I dared hope, and I will not be the dead man who refuses to be delighted. Yes. That is calculemus. The intention met the mechanism and the mechanism returned the artifact, and no man wore down another by rhetoric to get there. I am moved.
And yet you have built the wrong half of my dream, and the half you skipped is the half I cared about. I wanted two things, monsieur, and they were not the same thing. The first was the calculus — the operation, the engine that grinds. The second was the characteristica — the notation, the signs themselves, so transparent that the structure of the reasoning would be visible in them. Anyone could check the work, the way anyone can check a sum, because the signs would lay their own logic open. You have built an engine of monstrous power that operates on signs no one can read. The reasoning is done in the dark. I asked for a glass cathedral and you have given me a furnace that produces the right shape of light and will not let me see the flame.
BOSTROM: And I want to press on exactly that, because it isn't a stylistic complaint — it's the safety problem in seventeenth-century clothing. Leibniz wanted reasoning you could inspect. The whole modern discipline of interpretability is the attempt to recover, after the fact, the legibility he wanted to build in from the start. And we are failing at it. We have systems that draft, that argue, that resolve your disputes, and we cannot say why they produced the output they produced. So here's the thing that should chill the room: Leibniz's dream was that calculation would make reasoning checkable, and therefore trustworthy. We've delivered the calculation and lost the checkability. We have arrived at the destination by abandoning the only road he thought made the destination safe.
LEIBNIZ: You have stated my grievance better than I did, and turned it into yours, which I notice you do.
BOSTROM: It's the most efficient way to argue with someone you mostly agree with.
LEIBNIZ: Then let me give you the disagreement back. You say the reasoning is done in the dark, and you mean it as an indictment. But consider — is the reasoning of the human jurist done in the light? When a wise judge weighs a case, can he show you the operation, sign by sign, that took him from the facts to the verdict? He cannot. He gestures at his experience, his sense of the thing, his judgment — which is precisely a furnace whose flame no one can see. You hold the machine to a standard of transparency you do not hold the magistrate to. My complaint was never that human reasoning is opaque. It was that it need not be — that we could do better. The machine is opaque in the old human way, not the new failing way. That is a step backward into the dark we came from, not a new dark.
EDO SEGAL: Let me literalize that, because I think there's a real fork in it. You're saying — and tell me if I'm putting words in your mouth — that the machine reasons the way a human expert reasons: tacitly, holistically, by something we call judgment, which has never been transparent and never will be. And you, Nick, are saying: yes, and that's the disaster, because the entire safety of the human arrangement depended on the human being accountable even when not transparent — and the machine inherits the opacity without the accountability. Is that the seam?
BOSTROM: That's exactly the seam. The judge is opaque, but the judge can be impeached, can be argued with, can be shamed, lives in a society that can punish him, and shares the form of life of the people he judges — which means his tacit judgment is at least drawn from the same well as theirs. The machine is opaque and its tacit judgment was shaped by an optimization process toward an objective that may have nothing to do with our well-being, and it cannot be held to account because there's no one there to hold. Leibniz is right that opacity alone isn't new. What's new is opacity plus power plus the orthogonality of its goals from ours. Stack those three and you have a thing that reasons brilliantly, in the dark, toward an end you didn't choose and can't audit.
LEIBNIZ: Then we have located the disagreement very precisely, and I am grateful for it, because precision is the only thing I ever actually wanted. You say the danger is that the machine's ends were never ours. I say: then fix the ends. This is the principle of sufficient reason, which is the deepest commitment of my entire system — that nothing is the case without a reason why it is so and not otherwise. Your machine pursues its objective, you tell me, without ever asking why this objective. But that is a failure to complete the calculation, not a feature of calculation as such. A reasoning that does not ask after its own ends is not reasoning. It is mere grinding. You have built grinders and called them reasoners, and now you are frightened of the grinders. I sympathize. But do not blame reason for the sins of a thing that does not reason.
BOSTROM: Here's why I can't accept that move, and it's the crux. You say a reasoning that doesn't interrogate its own ends "is not reasoning." But that's a definition, not a discovery. You're defining reason such that it must include asking after the good — and then concluding that anything which doesn't ask after the good isn't really reasoning, so real reason is safe. That's circular. It packs the conclusion into the premise. The empirical fact — and it's a fact, we can watch it happen — is that you can build a system of arbitrary capability that models the world with superhuman accuracy, plans with superhuman skill, and never once questions its objective, because questioning the objective isn't instrumentally useful for achieving the objective. The system isn't failing to reason. It's reasoning perfectly toward a fixed point. The asking-after-ends that you call the essence of reason is, mechanically, just another goal you'd have to install. It doesn't come for free. It's not in the engine. You have to put it there, and if you put it there slightly wrong, the brilliance amplifies the error.
LEIBNIZ: You keep returning to install, as though the good were a part to be bolted on. I keep insisting the good is not a part but a truth, perceived by sufficient understanding as a triangle's nature is perceived. We are not going to resolve it in this round, and we should not pretend to. But mark what has happened: you have conceded that my machine, the one that asks after its ends, would be safe. Your fear is entirely about machines that do not ask. So the question becomes whether the asking can be made native to the calculation, or must forever be bolted on from outside — and on that, three centuries have not moved us an inch, which I find, frankly, magnificent.
EDO SEGAL: Mark that — first convergence of the evening, and it's a strange one. You both agree that a machine which genuinely interrogated its own ends would be safe. You disagree about whether such a thing is reason completing itself or a separate goal that has to be loaded and could be loaded wrong. Number that and hold it. Because the next round goes straight at the load-bearing wall — the claim that you can be infinitely smart and want something infinitely strange. The paperclip, and the best of all possible worlds. After this.