CONCEPT

Kolmogorov Complexity

The amount of information in an object is the length of the shortest program that produces it—Kolmogorov’s 1965 definition that made information intrinsic to things rather than relational to ensembles, and that formally identifies compression with comprehension: to understand is to compress, and the depth of understanding is measured in bits saved.

Before Kolmogorov, information was a statistical concept: it described the average surprise in a stream of messages drawn from a known distribution. Claude Shannon’s magnificent apparatus could tell you the entropy of a source but could not say anything about a single object considered alone, because a lone message has no distribution. Andrey Kolmogorov’s 1965 paper made the measure intrinsic. The complexity of a string, on his definition, is the length in bits of the shortest program that outputs it on a universal computer. A string of a million zeros has low complexity—a short program produces it. A genuinely random string of a million bits has complexity approaching a million—no program shorter than itself generates it. This relocates information from channels to things, from distributions to objects, from the statistical ensemble to the individual instance. Every learning machine is, in the most literal sense available, an approximation of this measure: it searches for the short program—the compact representation—that explains the training data, and its success at generalisation is exactly its success at compression. The concept is beautiful, exact, and permanently uncomputable: the ideal that defines the goal of learning and cannot itself be computed. No algorithm can always find the shortest program; the problem is as undecidable as the halting problem. The incomputability is not a limitation to be engineered away. It is the permanent mark of the gap between the mathematical ideal and any real learning machine that runs on finite resources.

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI asks what it means to see the machine clearly. Kolmogorov complexity supplies one of the sharpest lenses available: a learning machine is, formally, a compression machine, and its understanding of a domain is exactly its ability to represent that domain compactly. This makes the success of large language models legible in a way that the vocabulary of “emergent capabilities” does not: the models are very good compressors of the statistical regularities of human language, and their generalisation follows from that compression. It also makes the specific failure modes legible: the models are only compressors, and the things that cannot be compressed—the singular, the irreproducible, the genuinely novel—are precisely what the compressor misses.

The concept also anchors the cycle’s account of why there are hard limits to AI capability that scale cannot cross. Algorithmic randomness—the objects whose shortest description is themselves, objects that are maximally incompressible—is the formal boundary of what any learner can extract, regardless of scale or architecture. Knowing this limit exists, and knowing that its location cannot be algorithmically determined, is the kind of clear-eyed understanding the cycle asks its readers to develop: not fear of the machines, not dismissal of them, but accurate measurement of what they can and cannot do.

Origin

Kolmogorov introduced the concept in 1965 in “Three approaches to the quantitative definition of information”; Ray Solomonoff had arrived at closely related ideas slightly earlier from the direction of universal prediction, and Gregory Chaitin developed the same framework independently and simultaneously from the direction of randomness. The measure is sometimes called Kolmogorov-Chaitin complexity or algorithmic information theory to honour all three contributors. The essential mathematical object—the length of the shortest program producing a string on a universal Turing machine—is Kolmogorov’s formulation.

The invariance theorem, the result that gives the measure its claim to objectivity, states that the complexity of a string measured on any two universal computers differs by at most a constant—the translation cost of simulating one machine on the other. For long strings, as the strings grow, this constant becomes negligible and the measure converges to an absolute value independent of the choice of computer. For finite objects, the constant remains, and the measure retains a residual machine-dependence that resurfaces whenever the concept is applied to actual learning systems.

Key Ideas

Compression is comprehension. A scientific law compresses: Newton’s three lines replace an unbounded catalogue of planetary positions. A concept compresses: the word “tree” stands in for an enormous structured class of organisms. A model that has genuinely learned a dataset has found a description shorter than the data itself—and the short description is exactly what generalises to unseen data, because the structure captured is the same structure that generates new examples. Memorisation is storage; learning is compression; and the line between them is the line between failing and succeeding at finding the short program.

The incompressible particular. The concept draws a boundary with a precise and uncomfortable implication: some objects resist compression not because they are empty but because they are full. The maximally random string and the irreplaceable individual both register as incompressible to any compressor. The machine cannot distinguish between noise and the singular. Whatever the deepest sources of human meaning are—the particular face, the unrepeatable event, the singular utterance—they may be, in Kolmogorov’s terms, beyond reach of any compression-based form of understanding. The compressor excels at the typical and misses the particular that matters.

The uncomputability. No algorithm can compute Kolmogorov complexity in general. For any string, you can find shorter and shorter programs and confirm that each is a valid producer—but you can never verify that you have found the shortest, because the space of programs is unbounded and undecidable. The measure is the best-defined ideal in the theory of learning and the one most completely out of reach. Every real compressor is an approximation to it: a bounded, computable, architecture-specific attempt to find descriptions that are short without knowing how short “shortest” would be. The incomputability is a structural feature of the theory, not an engineering limitation.

Debates & Critiques

The deepest debate that Kolmogorov complexity generates for AI is whether the identification of learning with compression holds at the frontier. Proponents argue that the empirical evidence is striking: the best language models are also the best general compressors, capable of compressing not only text but images and audio better than domain-specific algorithms, despite never being trained for the task. This cross-domain compression suggests that the short descriptions found during training genuinely capture deep structure. Sceptics argue that the compression metaphor flatters the machines by associating them with an ideal—Kolmogorov’s ideal—that they cannot reach and cannot even approximate in a principled sense, because they are not universal computers. They are specific, bounded architectures whose inductive biases determine what counts as “short,” and those biases are chosen by their designers, not derived from any theory. A third line of debate concerns the incompressible particular: whether the concepts and singular things that resist compression are precisely the locus of human meaning, and whether a civilisation that increasingly delegates understanding to compression machines is thereby delegating away the capacity to attend to what cannot be compressed.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Debates & Critiques

Related Entries

Further Reading