Twenty-Three Problems and the Power to Name

Page 1 · Twenty-Three Problems and the

EDO SEGAL: Professor Hilbert, on a summer morning in 1900, in Paris, you did something almost no one has managed before or since. You stood before the International Congress of Mathematicians and told an entire discipline what to think about for a hundred years. Twenty-three problems. Some specific, some vast. The list organized the labor of thousands of minds across generations. I run a company; I have watched what happens when one person with enough authority says this is what matters. So I want to ask you about the act, not the contents. What did you understand about power that morning?

What counts as progress in these systems is set by benchmarks — curated collections of problems with scorable answers.

HILBERT: I understood that what gets worked on is not given by nature. It is chosen. And that a sufficiently authoritative, sufficiently well-judged choice can focus the effort of a generation the way a lens focuses light. I did not solve the problems. I named them, ranked them as worthy, and in naming them made them central. To solve a Hilbert problem became, and I am told remains, a way to make one's name. There is a power here that is almost editorial — the power to say this matters, this is the frontier, this is what a serious person should attempt. I will tell you plainly: I am proud of that morning in a way I am proud of few of my theorems. A theorem is a fact. An agenda is a direction for a civilization.

EDO SEGAL: I want to put it to you that your machine descendants run on exactly this power, and it's invisible to almost everyone outside the field. What counts as progress in these systems is set by benchmarks — curated collections of problems with scorable answers. A benchmark is a Hilbert list in miniature: it declares these are the problems whose solution is advancement, and the whole community optimizes toward them. Careers and billions of dollars follow the leaderboard. Gödel, you're quiet — I suspect you have a warning about this.

· · ·

Page 2 · Twenty-Three Problems and the

GODEL: I do, and it is drawn from the fate of Professor Hilbert's own list, which is more instructive than its triumph. Two of the twenty-three problems were posed in the confident expectation that the answer was yes and merely awaited discovery. The second asked for a proof of the consistency of arithmetic. The tenth asked for a procedure to decide whether any Diophantine equation has integer solutions. Both turned out to be impossible in the form Hilbert intended — the second undone by my theorem, the tenth proven undecidable by Matiyasevich in 1970. The agenda was not neutral. It encoded an optimism, and where the optimism was wrong, the problems did not get solved. They got dissolved — revealed to rest on a premise that could not hold. Now hear the warning for your benchmarks. When you set "human-level general intelligence" or "solve reasoning" as a target, you may be importing an assumption as questionable as Hilbert's assumption that every well-posed problem has a constructive answer. Some of your grandest goals may be not unsolved but malformed. You are measuring progress toward a finish line you have not proven exists.

And I will accept that warning and then turn it on its head, because Gödel has just made my deepest point for me without noticing.

HILBERT: And I will accept that warning and then turn it on its head, because Gödel has just made my deepest point for me without noticing. Yes — two of my problems dissolved. But consider what the dissolving produced. My second problem was wrong, and being wrong, it provoked Gödel's theorems — one of the supreme results of the human mind. My tenth was wrong, and it provoked the entire theory of Diophantine undecidability. The Entscheidungsproblem was answered no, and the answering built the computer. A sharp wrong conjecture is more valuable than a vague right caution, because it tells people exactly what to attack, and the attack — even when it overturns the conjecture — produces the real knowledge. This is the productive form of being wrong, and your benchmark-chasing machine age is its direct heir. Do not be so frightened of a malformed goal. A malformed goal, stated precisely, is a machine for generating the theorems that reveal its malformation. That is not failure. That is how a field learns what it is.

· · ·

Page 3 · Twenty-Three Problems and the

GODEL: There is something almost unanswerable in that, and I want to mark it rather than dodge it, because the chair asked us to mark convergence and we have just found some. Professor Hilbert and I agree — fully — that his wrong conjectures were more fertile than most men's correct ones, and that the precision of a demand matters more than its truth. Where we part is the lesson for the present. He says: therefore press on, demand the impossible, the demanding builds the future. I say: therefore press on, but know which walls are theorems. His optimism, pursued without my map, spends a generation's effort battering a provable wall — and effort spent against a provable wall is not fertile. It is wasted. The agenda must include the impossibility results, or it becomes a machine for manufacturing heartbreak at scale.

EDO SEGAL: Let me press on the benchmark point once more, because I live inside it and it costs me. When my engineers cheer that a model "saturated" a benchmark — got every problem right — I have learned to feel a chill instead of a thrill, and I want to know which of you the chill belongs to. Because a saturated benchmark can mean two opposite things: the machine got better, or the test got into the training data. Professor Hilbert, you of all people understood that the value of a problem is destroyed the moment its answer is published. What happens to a Hilbert list when the people being tested have memorized the list?

· · ·

Page 4 · Twenty-Three Problems and the

HILBERT: Then it is no longer a Hilbert list. It is a catechism. The entire force of my Paris problems was that they were open — that no one knew the answers, that the answers had to be found by genuine work against genuine ignorance. The moment a problem's solution is known and circulated, solving it ceases to measure anything except recall. So your saturated benchmark measures, at best, that the machine has read the answer key — which is no more a sign of understanding than a parrot reciting a proof is a sign of geometry. I will give you the rigorous form of your chill, Mr. Segal: a test is only a test against the unknown. Against the known, it is a mirror, and a mirror that flatters. If you want to know whether the machine can think, you must do what I did in 1900 — pose it a problem no one has solved, including the people who built it, and watch. The benchmark culture has forgotten that the only honest examination is one the examiner cannot pass either.

GODEL: And I will sharpen it to the point that touches my theorem, because there is one. Even a genuinely open benchmark — problems no one has solved — measures only performance within a fixed frame. It cannot measure the act I keep insisting matters: the stepping outside the frame to a new problem, a new axiom, the recognition that the frame itself is incomplete. No benchmark can score that, because to score it you would have to already possess the outside-the-frame answer, and if you possessed it the problem would no longer be outside the frame. So here is the deepest thing the agenda-setting power conceals: the most important intellectual act — the one Professor Hilbert performed in choosing his problems, the one I performed in seeing past Principia — is structurally unmeasurable by any test. You cannot benchmark the founding of a new frame. Which means a civilization that decides what counts as progress entirely by leaderboard will, slowly and invisibly, stop valuing the one act that no leaderboard can see — and that act is the whole of what I mean by mind.

· · ·

Page 5 · Twenty-Three Problems and the

EDO SEGAL: Mark that — first convergence of the night, and notice it took the subject of being wrong to produce it. They agree that precise error is the engine of mathematics. They disagree about whether the map of the impossible is a brake or a steering wheel. Number it; we'll need it later. Now we go to the wound itself. The most important item on Hilbert's list — the demand that mathematics be proven complete and consistent — met a twenty-four-year-old in Königsberg who did not dissolve the problem but reshaped the limits of what any system, human or machine, could ever be. We've circled it long enough. The theorem. After this.

· · ·

Continue · Chapter 5

The Sentence That Cannot Be Proved

→