A Billion Swans

Page 1 · A Billion Swans

EDO SEGAL: Pedro, the swan is Karl's signature bird, so I want you to take it from him for a round. A model has been trained on, let's say, every image of every swan ever posted, every sentence ever written about swans, the genetics, the migration data, the Latin. Karl says no number of white swans licenses the next one — that the model has seen the wake of the world and not the world. You say what, exactly? Has a billion swans bought the machine something a thousand never could, or is Karl right that the number is irrelevant and only the structure of the inference counts?

DOMINGOS: The number matters, but not the way the inductivist thinks and not the way Karl fears. Here's the thing the swan story hides. When you train on a billion swans and a billion other things, the model isn't learning "swans are white." That's the cartoon. It's learning a compressed representation in which swan sits in a web of relations — to birds, to water, to flight, to whiteness as a contingent feature rather than a defining one, to the very category of exceptions. And the proof that it learned the structure and not just the surface is that it handles the black swan better than your story predicts. Ask a good model about a black swan and it doesn't melt down; it tells you black swans exist, in Australia, that the all-white generalization is famously false, that Taleb made a whole philosophy of it. Why? Because the pattern it found wasn't the brittle rule. Compression forces it to find the real regularities, the deep ones, because those are the ones that let you predict a billion examples with a model far smaller than the examples. Karl, compression is your severe test in disguise. A model that just memorized would be as big as the data and would die on anything new. The only way to be small and still predict is to find the structure that's actually there. Learning is lossy compression, and lossy compression of a lawful world is a theory of that world.

· · ·

Page 2 · A Billion Swans

POPPER: This is your strongest move and I have been waiting for it, because it is the move David Deutsch makes and it is the move I respect most and accept least. You say: to predict the wake well, with a model smaller than the wake, the machine must reconstruct the boat — the deep structure that generated the surface. Compression equals understanding. Let me grant the whole mechanism and deny the conclusion, because the gap between them is the gap I have defended my whole life.

Compression finds the regularities that were in the data. It is, by construction, a magnificent summary of the past. But a theory, in my sense, is not a summary of the past — it is a bold claim about the unobserved, including the unobservable and the not-yet-existent, that sticks its neck out beyond the data in a way the data did not force. Newton's theory was not a compression of the apple-fallings he had seen; it claimed something about the moon, which he had not dropped, and about masses that would not exist for centuries. The compression machine cannot do that, because compression is conservative by its nature — it is rewarded for fitting what is there and punished for inventing what is not. The boldest, most fertile conjectures in the history of thought were bad compressions of the existing data at the moment they were proposed. They predicted things that contradicted the record and turned out right. Your machine, optimized to compress the record, would have shot every one of them down as overfitting in reverse. So yes — it reconstructs a boat. It reconstructs the average boat in the harbor it was shown. It does not build the ship that has never sailed, and the ship that has never sailed is what I mean by knowledge.

· · ·

Page 3 · A Billion Swans

DOMINGOS: [ He's good. I'll give him the harbor and take the open sea. ] That's the best objection to compression-as-understanding I've heard, and here's where I think you're half right and half stuck in the year you died, Karl, no offense. Half right: a model trained only to predict the next token is conservative, and it won't spontaneously propose general relativity. Agreed. But you've frozen the machine at one tribe. The evolutionaries — genetic programming — don't compress the past; they generate variation and select, which is exactly bold conjecture plus refutation, mechanized. They invent ships that never sailed, most of which sink, and keep the few that float. And the newest systems do something you'd find harder to dismiss: they propose candidate hypotheses, run experiments against the world, and keep what survives — the automation of discovery itself. That's not summarizing the harbor. That's building a thousand ships a night and drowning the ones that leak. The boldness you say machines lack is the easy part, Karl. You just generate wildly. The hard part was always the refutation, and that's the part we're good at.

A genuine novelty that says: everything you showed me points one way, and I claim the truth lies the other way, and here is the single observation that will destroy me if I am wrong.

POPPER: Then build me the machine that generates a conjecture contradicting its own training distribution and stakes a decisive, risky prediction on it — and survives. Not a variation within the space of the seen. A genuine novelty that says: everything you showed me points one way, and I claim the truth lies the other way, and here is the single observation that will destroy me if I am wrong. Do that, and I will say the machine has crossed from learning into knowing, and I will say it publicly and gladly. I do not believe it is impossible. I believe it has not been done, and that the systems being sold today are built to do the precise opposite — to stay inside the harbor, because the harbor is where the predictions come out right and the customers stay happy.

· · ·

Page 4 · A Billion Swans

DOMINGOS: I accept that as a fair challenge and I'll tell you honestly: we're not there, and the commercial incentive runs exactly the wrong way, like you said. The money rewards the model that pleases, which is the model that stays in distribution. The bold extrapolator is the one that says alarming, unpopular, falsifiable things — and that's the one we tune out of the product, because it's also the one that says wrong and weird things on the way to the few right ones. We're optimizing for the obedient guesser and calling it safety. There's a real tragedy in that and it's closer to your fear than I'd like.

I want to lift this onto the staircase before we move, because the reader is climbing and needs to know what this costs them.

EDO SEGAL: I want to lift this onto the staircase before we move, because the reader is climbing and needs to know what this costs them. What I hear is that the machine is superb in the harbor — inside the distribution of what humans have already thought — and that the harbor is enormous, larger than any single mind, which is exactly why my Trivandrum engineers flew. But the open sea, the genuinely new, the conjecture that breaks with the record — there you both have doubts, and the market is actively steering the machine away from the one move that would resolve them. So the reader's question becomes very concrete: when you ask the machine something, are you in the harbor or the open sea? Because the answer changes everything about how hard you should check. Hold that. Next round: the man who built a good part of this machine wants to deflate the people overselling it — and Karl has spent his life on why the oversellers always win. The hype, after this.

· · ·

Continue · Chapter 6

Against the Hype, Against the Oracle

→