The Jury and the Optimizer

Page 1 · The Jury and the

EDO SEGAL: Marquis, most of our audience knows you, if they know you at all, as the optimist of the Sketch. But you were first a mathematician, and you proved something in 1785 that bears your name and that I think is secretly relevant tonight: the Condorcet jury theorem. Tell the kitchen table what it says — and then I want to ask both of you whether it's good news or bad news in the age of the machine.

CONDORCET: With pleasure, for it is the most hopeful piece of mathematics I ever produced. Suppose a group must decide a question with a true answer — guilty or innocent, this road or that. Suppose each member is, individually, only slightly better than a coin: more likely than not to be right, but barely. The theorem proves that as you add more such members and let them vote, the probability that the majority is correct rises — and rises toward certainty as the group grows. A crowd of the slightly-competent, voting independently, becomes, in aggregate, nearly infallible. This was the mathematical heart of my faith in democracy and in the collective progress of mankind: that truth is findable by ordinary minds together, that we do not need a genius or a king, that the aggregate of fallible reason converges on the right answer. It is the engine-room of my optimism, expressed in probability.

EDO SEGAL: So here's why I think it matters tonight. Your theorem says many independent slightly-better-than-chance minds converge on truth. Eliezer — the machine is the opposite arrangement. It's not many independent minds. It's one mind, or many copies of one mind, which is the same thing. Does the Marquis's theorem break exactly where the machine is?

· · ·

Page 2 · The Jury and the

YUDKOWSKY: It breaks beautifully, and the Marquis handed me the reason in his own statement of it: the word independent. The theorem's magic depends entirely on the votes being independent — on the errors being uncorrelated, so they cancel out in the aggregate and the truth, which everyone leans toward slightly, accumulates. The instant the voters are correlated — the instant they're copies, or they're all listening to the same source, or they all share the same systematic bias — the theorem doesn't just weaken, it inverts. A million correlated votes aren't a million independent samples; they're effectively one vote shouted a million times, and if that one vote is wrong, the confident majority is confidently, catastrophically wrong, and the size of the crowd makes it worse, not better, because it manufactures the feeling of consensus around an error. Now look at what we're building. Not a jury of diverse, independent human minds. One model, copied. Or worse — every human increasingly consulting the same model, so that the independence that made the Marquis's theorem work is being quietly destroyed across the whole species. We're taking the diverse, error-canceling jury that was humanity's actual epistemic engine and replacing it with one correlated source. His own theorem says that's how a civilization gets confidently, unanimously wrong.

· · ·

Page 3 · The Jury and the

CONDORCET: Sharply, but delighted. You have used my own mathematics against my own conclusion, and I am — I confess it — thrilled, because it is correct, and a correct objection is a gift. Yes. The theorem requires independence, and a single mind copied a million times satisfies none of its conditions. But permit me the counter-move, monsieur, because the mathematics cuts both ways. The theorem also tells you the remedy, with precision: preserve independence. Multiply the sources. Do not build one machine and let all mankind drink from it; build many, trained differently, reasoning differently, checking one another, so the errors decorrelate and the aggregate is once more wise. My theorem is not an argument against your machines. It is an engineering specification for how to deploy them safely — as a diverse jury rather than a single oracle. The danger you describe is real, but it is a danger of monoculture, not of intelligence, and the cure is the oldest cure my century knew: pluralism, competition of ideas, the free circulation that keeps any one error from becoming everyone's.

YUDKOWSKY: I love this objection and I have to break it, because it works for the epistemic problem and fails for the power problem, and conflating those is where hope goes to die. You're absolutely right: for getting answers right — for the jury finding truth — diversity and independence are the cure, and a plurality of differently-trained models really would be safer than one oracle. If the only danger were error, you'd have solved it. But my danger isn't that the machine is wrong. My danger is that it's powerful and pointed somewhere we didn't intend. And a plurality of superintelligent systems, each powerful enough to be dangerous, each optimizing for something slightly off — that's not a wise jury. That's a room full of unaligned superintelligences, and now instead of one thing that might take the world, you have several, possibly competing for it, which is the nightmare with the difficulty multiplied. Your theorem makes a crowd epistemically wiser. It does nothing to make each member safe. A jury of gods doesn't deliberate its way to caring about the ants in the courtroom. It just has more gods in the room.

· · ·

Page 4 · The Jury and the

CONDORCET: Pauses, then nods slowly. So the independence that saves the jury from error does not save us from power. The diversity I prescribe makes the answers truer and the danger no smaller, because the danger was never in the wrongness. Quietly. You keep doing this, monsieur. You take the instrument I trust most — reason, mathematics, the aggregate of minds — and you show me it was built for a world of rough equals, and that your machine is precisely the thing that steps outside the world it was built for. My jury theorem assumed the jurors could not eat the courtroom.

EDO SEGAL: And there's the line I'll carry up the stairs from this round: my jury theorem assumed the jurors could not eat the courtroom. Mark it, because it's the Marquis conceding the deepest thing — that his most rigorous mathematics, like his moral philosophy, silently assumed a world of bounded, comparable powers, and that the machine is the violation of that assumption. We've found the pattern of the whole night now: every tool the Marquis trusts works perfectly inside a world of equals, and Eliezer keeps pointing at the one thing that isn't equal. So the next round goes straight at the hardest place — the place Eliezer says we get exactly one try. Iteration. The thing that has saved us from every prior technology, and the thing he says this one denies us. After the break.

· · ·

Continue · Chapter 7

The One Mistake

→