Marquis de Condorcet vs Nick Bostrom on AI · Ch7. The Error You Don't Survive ← Ch6 Ch8 →
Txt Low Med High
HOUR TWO — THE ERROR AND THE RACE
Chapter 7

The Error You Don't Survive

Page 1 · The Error You Don't
Existential Risk
Existential Risk

EDO SEGAL: Nick, I think this is the load-bearing beam of your entire structure, and I want it stated without mercy. Every good thing humanity has ever built, it built by failing and fixing. Airplanes got safe through a century of crashes. The Marquis's whole optimism runs on it — reason as the self-correcting method, error then correction then a higher floor. You say there's one domain where that engine doesn't turn over. Tell me why.

Because the self-correcting method has a hidden requirement that holds for every prior case and fails for this one: you have to survive the error to learn from it.

BOSTROM: Because the self-correcting method has a hidden requirement that holds for every prior case and fails for this one: you have to survive the error to learn from it. Trial and error is the greatest method we have, and it presupposes that you're still there after the trial to absorb the lesson. Airplanes got safe because we walked away from enough crashes to study them. Every improvement the Marquis celebrates was paid for by a recoverable failure. But an existential catastrophe is, by definition, the failure you don't walk away from. There's no second iteration. Our approach to existential risk cannot be one of trial and error, because there's no opportunity to learn from the errors — the first uncontained, misaligned superintelligence isn't a crash we study. It's a terminus. And that single fact inverts the entire logic of progress. The Marquis's method says: proceed, observe failures, improve. That method is correct everywhere except the one place where the first serious failure is also the last. There, and only there, you have to get it right before, by foresight alone, with no feedback, against every instinct the method trained into us. The hardest thing I'm asking the human race to do is to be careful about a risk it can't learn from by being burned.

EDO SEGAL: Marquis, that's aimed at the heart of you. Reason's whole power, in your account, is that it learns from being wrong. He's saying here is a wrong you don't get to learn from.

· · ·
Page 2 · The Error You Don't

CONDORCET: It is aimed at my heart and it grazes it, because the objection is genuine and I felt it land. Let me concede precisely and then resist precisely. I concede that there exist errors which foreclose their own correction, and that a builder who ignores this class of error is criminal. The retreat of smallpox was reversible; the extinction of the species would not be. On the logic, monsieur, you are simply right, and I will not insult the reader by pretending the asymmetry isn't real.

You have proven that if such an error is imminent, ordinary trial-and-error fails — and from this you conclude we must halt, or nearly halt, the very engine of progress until the danger is mastered.

Here is the resistance. You have proven that if such an error is imminent, ordinary trial-and-error fails — and from this you conclude we must halt, or nearly halt, the very engine of progress until the danger is mastered. But notice the structure of your own argument: it is itself a piece of reasoning, offered for correction, in a debate, before the fact. You are doing the thing you say is impossible — using foresight, the predictive arm of reason, to correct an error in advance of experiencing it. That is not the death of my method. That is my method operating at its highest power. Foresight is reason refusing to wait for the burn. So we do not disagree that the burn would be fatal. We disagree about whether reason can see far enough ahead to step around it — and you, by the mere fact of having seen it and warned us two decades early, are my best evidence that it can. You are not the refutation of progress, Monsieur Bostrom. You are progress, doing its job, ahead of schedule.

EDO SEGAL: Before Nick answers that, I want to test the Marquis's foresight claim against the actual state of the work, because this is where my own building life makes me nervous. Nick, the discipline that's supposed to do the stepping-around has a name — alignment — and you helped found it. So tell the Marquis honestly: is it on track? Is reason's foresight actually solving the problem fast enough, or is the capability racing ahead of the safety?

· · ·
Page 3 · The Error You Don't

BOSTROM: Honestly? The capability is racing ahead of the safety, and not by a little. I'll give the Marquis the encouraging half first, because it's real: twenty years ago alignment was three people and a mailing list, and now it's a field, with funding and brilliant young researchers and genuine results. That's reason's foresight doing exactly what the Marquis says it does — mobilizing in advance of the harm. But the discouraging half is the one that governs. The same period that grew the safety field grew the capability field a hundred times faster, because capability makes money and safety costs it. So the gap between what we can build and what we can control has been widening, not narrowing, even as the absolute amount of safety work explodes. The Marquis is right that foresight is mobilizing. He's not yet entitled to the conclusion that it's mobilizing fast enough, because "enough" is defined relative to a capability curve that is, right now, outrunning it. Foresight isn't losing for lack of effort. It's losing for lack of time, and time is the one resource the race burns fastest.

CONDORCET: Then I accept the correction, and I narrow my claim to exactly what I can defend. I do not claim foresight is winning. I claim foresight is the only thing that has ever won, and that the remedy for foresight outrun is not to abandon it but to fund it, free it, and multiply it — to throw at the alignment of the machine the same prodigious effort your century throws at its capability. You have described a race between two human enterprises, monsieur. That is not a law of physics. That is a question of where we point our genius, and where we point our genius is the most steerable thing there is. You have, once again, handed me a requirement dressed as a doom.

· · ·
Page 4 · The Error You Don't

BOSTROM: [long pause] That's the most generous thing anyone has ever done to my argument and it's also a trap, and I want to walk into it with my eyes open because it's worth it. The Marquis says my foresight is itself proof that reason can step around the cliff. He's right that foresight is our only tool here — I've staked my life on exactly that. But here's the cruelty he's stepping past. Foresight tells you the cliff is there. It does not, by itself, make the car slow down. I can see the failure mode with perfect clarity and be completely unable to stop us driving toward it, because the driving isn't done by my reasoning. It's done by a competitive race in which the first to arrive may win everything and the one who slows down to be careful simply loses to the one who doesn't. My warning is reason working. The world's response to my warning is the test of whether reason governs — and that's a different and much darker question. I can be the proof that we can see the cliff and still be the proof that seeing it isn't enough.

EDO SEGAL: Say more about that, because it just turned a debate about machines into a debate about us.

BOSTROM: It always was a debate about us. The Marquis's deepest faith isn't really in the machine. It's in our ability to govern the machine wisely — to let foresight steer. And the thing I've come to believe, the thing that's bleaker than any paperclip, is that the structure we'd have to govern through is built to punish the careful. That's the next problem, and it's the one I have the least hope about. Not the orthogonality thesis. The race.

· · ·
Page 5 · The Error You Don't

EDO SEGAL: Then let me mark the second convergence before we go there, because it's real and the reader should bank it. You both agree — fully — that foresight is the only instrument that works against an unrecoverable risk; that the burn-and-fix method fails exactly here. You disagree only about whether foresight, having seen the cliff, can actually slow the car. The Marquis says seeing is most of stopping. Nick says seeing and stopping are two different muscles, and the second one might be paralyzed. We pick that up after the break, at the place the river runs fastest — the race, the lever, and the question of who ends up holding it.

· · ·
Continue · Chapter 8
The Race and the Lever
← Prev 0%
Ch7 Next →