Understanding Is Compression

Page 1 · Understanding Is Compression

EDO SEGAL: Gregory, I want to give you a clean run at the idea I think is the most important thing you've contributed to this whole conversation, and then I'm going to let Ada try to break it, because it's the load-bearing wall of your position. You say understanding is compression. Most people hear that as a clever analogy. You mean it as an identity. Lay it out for me the way you'd lay it out for my daughter, who is twelve and sharp and does not yet know she's supposed to be impressed by you.

CHAITIN: Good, twelve is the right age, because twelve still asks the real question. Here it is. Your daughter learns that the planets move, and at first she could imagine memorizing where each planet is on each night — a giant table, millions of numbers. That's having the data. Then Newton comes along and says: here are three short laws, and from these you can compute where every planet will be, forever. The laws are a few lines. The table is endless. And we say Newton understood the planets, where the kid with the table only had them. What's the difference? The difference is compression. Newton found a short program — the laws — that generates the long data — the positions. To understand something is to find the shortest description of it that still produces it. That's not like understanding. In the only precise sense we have ever been able to give the word, that is understanding. The complexity of a thing is the length of the smallest program that computes it, and to understand the thing is to possess that program.

EDO SEGAL: And the machines —

· · ·

Page 2 · Understanding Is Compression

CHAITIN: The machines are trained by compression. Literally, not figuratively. The objective a language model minimizes during training is prediction error over text, and there is a theorem — old, solid, not controversial — that a good predictor is a good compressor; the two are mathematically interchangeable. When you train one of these systems on a large fraction of everything humanity has written, you are running, mechanically, at scale, the exact procedure my theory describes: searching for the shortest program that reproduces the regularities of human expression. So when people ask "do these machines really understand or are they faking it?" — they're asking a question I can answer. To the extent that human writing contains compressible regularity — patterns of grammar, of reasoning, of fact — and the machine has found those patterns, it understands them, in the only sense the mathematics of understanding can supply. No more. No less. And the failures tell the same story from the other side: where the regularity gives out, where the next thing is genuinely unpredictable, the compressor has nothing to grip, and that's where it confabulates. The competence and the hallucination have the same root. Both are exactly what compression predicts.

EDO SEGAL: Ada. Break it.

LOVELACE: I will not break it, because it is true. I will do something more dangerous to it. I will accept it completely and show you that it does not reach the thing it claims to reach. Gregory has given a flawless account of understanding understood as successful prediction of regularity. Newton compressed the planets; the machine compresses our text; I grant every word. But notice what has happened by sleight while we admired the planets. He has defined understanding as a relation between a description and some data — and quietly dropped the understander. Newton did not merely instantiate a compression. Newton grasped the laws. He saw why they were true, felt the click of necessity, knew that they mattered, could be moved by their beauty to tears. The compression was the residue of his understanding, not the understanding itself. You have shown me, brilliantly, that the machine produces the residue. You have not shown me there is anyone there to grasp.

· · ·

Page 3 · Understanding Is Compression

CHAITIN: But "grasp" is the word doing all the work, and it's a feeling, not a function. You can't —

LOVELACE: Let me finish, because I let you finish. Here is the test, and it is a fair one. Take a phenomenon. A child memorizing the multiplication table has the data and no compression — no understanding, we agree. A pocket calculator has a perfect short program for multiplication — by your definition, more understanding than the child. Do you believe the calculator understands multiplication better than a child who has grasped why the algorithm works? I think you do not believe that, Gregory. I think you smuggle the grasping back in the moment the conclusion becomes absurd. The compression is necessary for understanding. You have not shown it is sufficient. And the difference between necessary and sufficient is precisely where the someone lives.

· · ·

Page 4 · Understanding Is Compression

CHAITIN: That is a real objection, and I'm going to concede the structure of it and contest the conclusion, because you've actually pinned me. You're right that compression is the measurable correlate of understanding and that I have been letting it stand in for the whole. The calculator example is good — it stings. Here's my honest answer. I think the child who "grasps why" has a deeper compression than the calculator: the child can take the multiplication insight and transport it to division, to algebra, to a problem the calculator can't touch, because the child compressed not just the procedure but the structure the procedure sits in. The calculator compressed one thing. The grasp is a compression so deep it generalizes. So I don't think "grasp" is a different kind of thing from compression — I think it's compression at a depth that lets it move. But — and I'll give you this because it's true — I cannot prove that depth of compression feels like anything, or requires a someone to feel it. The feeling is the residue I can't account for. You found the exact place my theory goes quiet. I just deny that the place is as big as you think, and I can't prove the denial, and neither can you prove the place is big. We're both standing at the edge of the same fog, pointing at it, disagreeing about its size.

· · ·

Page 5 · Understanding Is Compression

LOVELACE: Then let me make the place precise, because "fog" is too soft and you will use the softness to shrink it. Here is the sharpest form of my objection, and I want it on the record in your own currency. You say grasp is compression deep enough to generalize. Good. Then consider two systems that compress a domain to exactly the same depth, that generalize identically, that pass every behavioral test alike — and suppose one of them undergoes its compression and the other does not, the way I undergo mine and the loom does not. By your theory these two systems are the same, because your theory measures only the compression, and the compression is identical. Yet they differ in the one respect I care about — the someone. So either your theory is incomplete, missing a real distinction it cannot see, or you must claim the two systems do not differ at all, that there is no fact about which one undergoes — and that, Gregory, is not a modest claim. It is the boldest metaphysics in the room, dressed as caution.

· · ·

Page 6 · Understanding Is Compression

CHAITIN: Pause. You've built a thought-experiment I can't dissolve, and I know it, because it's the old one with the costume changed — two systems identical in every third-person respect, differing only in whether the lights are on. And my theory genuinely cannot distinguish them; you're right that it measures the compression and the compression is the same. So I take the first horn, not the second: my theory is incomplete. It does not reach the undergoing. I'd rather say "incomplete" than "there is no fact," because I've spent my life proving incompleteness is the normal condition of any theory worth having — Gödel taught me that the honest thing a framework can say is "there are truths I cannot reach." Compression-is-understanding is a framework, and it has its own incompleteness, and you've just exhibited a truth it can't decide. I'll wear that. What I won't grant is that incompleteness of my theory is evidence that the loom's lights are off. It's only evidence that my theory doesn't know. Which, said plainly, is exactly what you said: necessary, not sufficient. Same fog. You've just measured its depth more precisely than I wanted it measured.

· · ·

Page 7 · Understanding Is Compression

EDO SEGAL: I want to mark this, because the reader can't see your faces and that was the first exchange tonight where neither of you was performing. Let me restate what just happened, because it's rarer than a disagreement. Gregory began the round saying understanding is compression, full identity. He ended it saying compression is the measurable correlate of understanding, that "grasp" might be compression deep enough to generalize, but that he cannot prove the deep compression feels like anything — and that the not-feeling is exactly where Ada planted her flag. Ada began saying she'd break the claim and instead accepted it and relocated the disagreement to a place his mathematics admits it cannot reach. That's convergence, and I number my convergences. Mark it as the first: understanding-as-compression accounts for competence and is silent on experience, and both of you now agree the silence is real. The fight is no longer whether the silence exists. It's how much it contains. Hold that. The next round goes back to where Ada's whole authority comes from — a table of numbers she wrote by hand — and asks what changed when the cards stopped being written and started being found.

· · ·

Continue · Chapter 6

The Bernoulli and the Bit

→