
The cycle that began with [YOU] on AI asks what it means to see the machine clearly. Kolmogorov complexity supplies one of the sharpest lenses available: a learning machine is, formally, a compression machine, and its understanding of a domain is exactly its ability to represent that domain compactly. This makes the success of large language models legible in a way that the vocabulary of “emergent capabilities” does not: the models are very good compressors of the statistical regularities of human language, and their generalisation follows from that compression. It also makes the specific failure modes legible: the models are only compressors, and the things that cannot be compressed—the singular, the irreproducible, the genuinely novel—are precisely what the compressor misses.
The concept also anchors the cycle’s account of why there are hard limits to AI capability that scale cannot cross. Algorithmic randomness—the objects whose shortest description is themselves, objects that are maximally incompressible—is the formal boundary of what any learner can extract, regardless of scale or architecture. Knowing this limit exists, and knowing that its location cannot be algorithmically determined, is the kind of clear-eyed understanding the cycle asks its readers to develop: not fear of the machines, not dismissal of them, but accurate measurement of what they can and cannot do.
Kolmogorov introduced the concept in 1965 in “Three approaches to the quantitative definition of information”; Ray Solomonoff had arrived at closely related ideas slightly earlier from the direction of universal prediction, and Gregory Chaitin developed the same framework independently and simultaneously from the direction of randomness. The measure is sometimes called Kolmogorov-Chaitin complexity or algorithmic information theory to honour all three contributors. The essential mathematical object—the length of the shortest program producing a string on a universal Turing machine—is Kolmogorov’s formulation.
The invariance theorem, the result that gives the measure its claim to objectivity, states that the complexity of a string measured on any two universal computers differs by at most a constant—the translation cost of simulating one machine on the other. For long strings, as the strings grow, this constant becomes negligible and the measure converges to an absolute value independent of the choice of computer. For finite objects, the constant remains, and the measure retains a residual machine-dependence that resurfaces whenever the concept is applied to actual learning systems.
Compression is comprehension. A scientific law compresses: Newton’s three lines replace an unbounded catalogue of planetary positions. A concept compresses: the word “tree” stands in for an enormous structured class of organisms. A model that has genuinely learned a dataset has found a description shorter than the data itself—and the short description is exactly what generalises to unseen data, because the structure captured is the same structure that generates new examples. Memorisation is storage; learning is compression; and the line between them is the line between failing and succeeding at finding the short program.
The incompressible particular. The concept draws a boundary with a precise and uncomfortable implication: some objects resist compression not because they are empty but because they are full. The maximally random string and the irreplaceable individual both register as incompressible to any compressor. The machine cannot distinguish between noise and the singular. Whatever the deepest sources of human meaning are—the particular face, the unrepeatable event, the singular utterance—they may be, in Kolmogorov’s terms, beyond reach of any compression-based form of understanding. The compressor excels at the typical and misses the particular that matters.
The uncomputability. No algorithm can compute Kolmogorov complexity in general. For any string, you can find shorter and shorter programs and confirm that each is a valid producer—but you can never verify that you have found the shortest, because the space of programs is unbounded and undecidable. The measure is the best-defined ideal in the theory of learning and the one most completely out of reach. Every real compressor is an approximation to it: a bounded, computable, architecture-specific attempt to find descriptions that are short without knowing how short “shortest” would be. The incomputability is a structural feature of the theory, not an engineering limitation.
The deepest debate that Kolmogorov complexity generates for AI is whether the identification of learning with compression holds at the frontier. Proponents argue that the empirical evidence is striking: the best language models are also the best general compressors, capable of compressing not only text but images and audio better than domain-specific algorithms, despite never being trained for the task. This cross-domain compression suggests that the short descriptions found during training genuinely capture deep structure. Sceptics argue that the compression metaphor flatters the machines by associating them with an ideal—Kolmogorov’s ideal—that they cannot reach and cannot even approximate in a principled sense, because they are not universal computers. They are specific, bounded architectures whose inductive biases determine what counts as “short,” and those biases are chosen by their designers, not derived from any theory. A third line of debate concerns the incompressible particular: whether the concepts and singular things that resist compression are precisely the locus of human meaning, and whether a civilisation that increasingly delegates understanding to compression machines is thereby delegating away the capacity to attend to what cannot be compressed.