CONCEPT

Form and Meaning

Emily Bender’s foundational distinction between the observable structure of language—marks, sounds, sequences of characters—and meaning, which is the relation between that structure and something outside it: the world, the intentions of a speaker, the situation being described.

Form is the observable substance of language: the marks on a page, the sounds in the air, the sequences of characters in a dataset. Meaning is what those forms are connected to—the world they describe, the intentions they carry, the understanding they create between people. Emily M. Bender’s foundational claim, argued most carefully in her 2020 paper with Alexander Koller, is that these two things are genuinely different and that the difference does not dissolve no matter how much form you accumulate. A system trained on text and only text has been trained on form and only form: the staggering quantities of human expression that entered its training were stripped of the situations in which that language was originally used. The intentions, the knowledge, the listeners in mind—none of that accompanied the words into the dataset. What entered the machine was the residue, the patterns, the form. And from form the machine learns to produce more form. Meaning, in Bender’s precise phrase, is a relation to something outside the form; to understand the word water is not merely to know which other words it tends to appear near, but to connect the word to the substance, to thirst, to rivers, to the experience of drinking. A system that has only ever seen the word in textual contexts has the first kind of knowledge and not the second—and no quantity of additional textual company adds up to aboutness, because aboutness was never in the text to begin with. This is the analytical core that makes Bender’s critique more than a mood: it is a reason with a shape, and the shape means that the dimension large language models improve along when they scale is the wrong dimension for the destination people imagine.

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI insists that meaning is not a property the machine delivers but something the human brings. The form-and-meaning distinction is the linguistic-scientific ground for that insistence. It explains, from first principles, why fluency decorrelates from authority: a system producing impeccable form can produce nothing about the world because it has no contact with the world, only with how the world tends to be described. The reader, encountering the output, supplies the connection automatically—because that is what competent language users do—and then credits the output with the understanding the reader has contributed.

The distinction also clarifies the symbol grounding problem from the user’s side. The grounding problem asks how symbols acquire meaning. Bender’s answer is that they acquire it through a relation to the world that text alone cannot carry. When we read fluent output as wise or authoritative, we are doing the grounding ourselves—connecting the symbols to our prior experience of a world the system has never touched. Knowing this is the first step toward not being deceived by one’s own gift for making meaning.

Origin

Bender and Alexander Koller introduced the form-meaning distinction as a precise philosophical claim in “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data” (ACL 2020). Their vehicle was the octopus parable: a hyper-intelligent octopus tapping an underwater cable between two islanders, learning the statistical patterns of their messages so well that it can produce plausible replies. The trap springs when one islander faces a genuine novelty—she is attacked by a bear and asks how to construct a weapon from sticks and a coconut. The octopus, having only ever seen the forms of words and how they tend to follow one another, has no way to answer. It has never encountered a bear, a stick, or a coconut as anything other than a string of characters appearing in certain contexts. It cannot reason about the physical world those words point to, because it never had access to that world. It had access only to the patterns of the messages, not to the things the messages were about.

The point of the parable was precise: Bender and Koller were not claiming the octopus is stupid or that pattern learning is worthless. They were drawing a line between two things easy to conflate. Form is the observable structure of how words combine. Meaning is the relationship between that form and something outside of language. A system exposed only to form has, in their phrase, “no a priori way to learn meaning,” because there is simply nothing in the statistics of word co-occurrence that bridges the gap to what the words are about. Cleverness at form is not partial credit toward meaning. It is a different kind of thing.

Key Ideas

Meaning requires grounding. To understand a word is to connect it to the world it names—to have encountered water, felt thirst, watched rivers—and to attach the words we read to that prior web of grounded experience. Humans use language to extend grounded understanding. A machine trained only on text has no grounded understanding to extend. The words and nothing the words refer to is the permanent condition of a system that touched only form. This is not a temporary engineering gap but a consequence of what the training signal contains.

Fluency triggers involuntary meaning-making. Human beings are, by design, makers of meaning. When presented with fluent language, we cannot help but interpret it: we reconstruct an intention behind it, a state of the world it describes, a mind that produced it. The trouble begins when this machinery meets text produced by a system that has no meaning to convey. The reader’s meaning-making apparatus runs as it always does, producing the experience of understanding something. But the understanding has been manufactured on the reading side. The text was an arrangement of forms; the meaning was supplied by the reader.

Accumulating form does not add up to meaning. The objection naturally arises that humans too learn from language, that much of what anyone knows came through words rather than direct experience. Bender acknowledges this but identifies the difference: humans bring grounding the machine lacks. We are embodied creatures with prior world-contact, and we attach words we read to that grounded experience and to a rich model of the people communicating with us. We use language to extend grounded understanding. The machine has the words and nothing the words refer to. No quantity of additional words supplies the missing reference, because the reference was never in the text to begin with—and this is the claim that makes Bender’s critique more than a mood: scale improves the system on the wrong dimension for the destination.

The practical force of the distinction. When a stochastic parrot produces a confident medical explanation, the question is not whether the words pattern correctly but whether they connect to medical reality. The system has no access to that reality, only to how such explanations are usually phrased. The form can be impeccable while the meaning is absent or wrong—and because we are built to read meaning into fluent form, we are poorly equipped to notice the absence. Bender’s distinction is a tool for noticing: it teaches us to ask, of any fluent output, not how good it sounds but what, if anything, it is actually connected to.

Debates & Critiques

The principal debate about the form-meaning distinction is whether it is too clean. Some critics argue that human language acquisition itself is substantially statistical—that meaning emerges from distributional patterns, that word meanings are constituted by their relations to other words rather than by direct world-contact, and that this collapses the distinction between what the machine does and what the child does. Bender’s response, developed across multiple papers, is that children bring embodied world-contact to language, that their distributional learning is anchored in a prior sensorimotor experience the machine entirely lacks, and that this prior grounding is exactly what distinguishes extending semantic understanding from learning a statistical model of form. A second debate concerns whether very large models trained on multimodal data—images, audio, video alongside text—close the gap by providing a richer form of world-contact. Bender does not deny that grounding through perception could in principle supply what text-only training lacks; her argument is about text-only training specifically, and she treats the multimodal case as a genuinely different question. The deepest disagreement concerns what “grounding” requires: whether any computational relation to the world counts, or whether something about the lived, embodied, intentional character of human world-contact is irreducible. That question reaches into philosophy of mind and remains genuinely open.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Debates & Critiques

Related Entries

Further Reading