CONCEPT

Compositionality

The principle, foundational in Frege's logic, that the meaning of a complex expression is fully determined by the meanings of its parts and the rules of their combination—and the most powerful available test for whether a system has genuinely learned structure or only approximated it.

You can understand a sentence you have never heard before. That capacity—to grasp an unbounded number of novel expressions from a finite stock of words and rules—is the defining feature of human language and the principle behind it is Frege's: the meaning of a complex expression is determined by the meanings of its parts and the rules by which they are combined. Compositionality is not merely a fact about language; it is the structural guarantee behind two properties any account of mind must explain. The first is productivity: from finitely many elements, infinitely many meanings, so that language is not a memorized list but a generative engine. The second is systematicity: the abilities come bundled—anyone who understands “the lawyer admired the doctor” can thereby understand “the doctor admired the lawyer,” because both draw on the same compositional grasp. Systematicity is the fingerprint of genuine composition, and it is precisely what neural networks struggle to exhibit past the edges of their training distribution. When researchers probe large models with sufficiently novel combinations—deeply nested clauses, unusual role bindings, arrangements far from what training rehearsed—performance degrades in ways a truly compositional system should not permit. The systems appear to have learned something powerfully like composition without having learned composition itself, in the clean, rule-governed, fully systematic sense Frege's logic embodies.

In the [YOU] on AI Field Guide

The [YOU] on AI cycle asks what these systems can be trusted to do. Compositionality answers by predicting precisely where trust is warranted and where it is not. On well-trodden combinations, close to the training distribution, a system that has learned approximate compositionality will generalize well. On genuinely novel combinations—the structural rearrangements that a fully compositional system would handle with the same reliability as familiar cases—the approximation frays. This is not an incidental failure mode; it follows from the nature of statistical learning versus rule-governed composition. Frege's principle is, in effect, a theory of where these systems will break.

The stakes are practical. A system that only approximates composition will handle a thousand routine variations of a task and then break, sometimes silently, on the thousand-and-first—which differs only in a structural arrangement its training distribution did not rehearse. The model that drafts a competent legal brief will sometimes confabulate a case citation because the sentence-pattern that typically accompanies citation is more present in the training data than the habit of checking whether the cited case exists. Genuinely compositional reference-checking would generalize from examples to rule; approximate composition interpolates and fails at the novel edge.

Frege himself provided a defense of the statistical approach through his context principle—the dictum that a word has meaning only in the context of a sentence—which is a forerunner of the distributional hypothesis that underlies modern large language models. To learn meaning from the company words keep across vast corpora is, arguably, what the context principle recommends. So Frege's legacy is genuinely double-edged: his compositionality is the rod that finds where models break, and his contextualism is part of the rationale for building them.

Origin

The principle emerges throughout Frege's work from his analysis of statements into function and argument. If the predicate is a function and its terms are arguments, then the meaning of the whole is the value you get by applying the meaning of the predicate to the meanings of the terms—a structure, not a list, from which novel combinations follow by rule. Frege did not coin the modern slogan (“the meaning of a whole is built from the meanings of its parts”), but the idea is woven through the Begriffsschrift of 1879, the Grundgesetze, and every subsequent work. The philosopher Jerry Fodor and Zenon Pylyshyn crystallized its role in the AI debate in their 1988 paper arguing that systematicity is the litmus test for genuine composition—a paper written before large language models existed that has become more rather than less relevant as they arrive.

The debate the 1988 paper opened—whether a network of statistical associations can be systematically compositional—has since moved from philosophy into the laboratory, as benchmark designers construct tests specifically probing systematic generalization: teach the model the meaning of primitives and then test combination in unseen arrangements. The results remain mixed: better than the most pessimistic predictions, worse than the standard pure compositionality requires.

Key Ideas

Productivity. A compositional system can understand and produce an unbounded number of novel sentences from a finite vocabulary and a finite set of rules. This is the property that makes language a generative engine rather than a memorized list, and that predicts a statistical system trained on a finite corpus will face a horizon where it must compose rather than recall.

Systematicity as the fingerprint. If you have genuinely grasped the compositional rule, you handle every instance it covers, not merely the instances you have seen. Systematicity—the bundling of related abilities—is the diagnostic. A system that handles “the lawyer admired the doctor” but fails on “the doctor admired the lawyer” has not grasped the rule, only the pattern. Researchers use this criterion to probe whether neural networks truly compose or only approximate it.

Partial compositionality. The honest verdict from current evidence is that large language models have achieved a partial compositionality—real enough to be useful, broad enough to cover most of what we ask, and yet measurably short of the exact, systematic, rule-bound composition that Frege's logic embodies. They compose the way an immensely well-read non-native speaker composes: fluently, idiomatically, almost always right, with hidden discontinuities exactly where the absorbed patterns thin out and pure structure would have to carry the weight.

The context principle as defense. Frege's own context principle—meaning is always meaning in a sentence—grounds the distributional hypothesis that words acquire meaning from context. If meaning is fundamentally a matter of how expressions function in larger wholes, learning from the distribution of words across large corpora is not a category error. Frege arms both sides: compositionality shows where statistical models break, contextualism explains why they work as well as they do.

Explore more

Browse the full You On AI Field Guide — over 8,500 entries