PERSON

Noam Chomsky

The linguist who ended behaviorism by demonstrating that language cannot be learned from data alone—and whose seventy-year insistence on the distinction between explaining a phenomenon and merely reproducing it is the most rigorous standing challenge to the theoretical foundations of modern AI.

Noam Chomsky is the inconvenient witness at every celebration of artificial intelligence’s achievements. Not because he denies the achievements—their engineering value is not in dispute—but because he insists on the distinction the field most wants to blur: between a system that predicts the surface of a phenomenon and a theory that explains why the phenomenon is as it is and not otherwise. In 1959 he reviewed B. F. Skinner’s account of verbal behavior and demolished, in a single long paper, the empiricist program that language could be explained as accumulated statistical regularity. Large language models—trained on text alone by statistical adjustment across vast corpora—are, in their deepest commitments, the heirs of the position he refuted. That they work as engineering does not, in his view, make them right as science. The whole point of a career spanning six decades and across two quite different intellectual projects—the science of language and the anatomy of political persuasion—has been to insist on exactly this distinction, between what a system does and what it tells us about the phenomenon it imitates. In an age when fluent output is mistaken for deep understanding, Chomsky is the clearest voice asking what understanding would actually require.

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI asks what it means to see the machine clearly, without the narcotic of hype or the paralysis of fear. Chomsky illuminates the machine from the angle of the prosecution. He is the most cited living intellectual, a founder of the cognitive science that modern AI claims as intellectual kin, and he has looked at that kin and pronounced it a way of not learning anything about language. To read him against the machines is to be forced to ask the hardest question about them: not whether they are useful, which is obvious, but whether they are explanatory—whether they tell us anything about the thing they imitate. That question does not go away because the demos are impressive.

His lens reframes what the cycle calls the fluency problem. The same system that drafts a competent legal brief will invent a case that does not exist and cite it with perfect confidence. Chomsky’s framework explains this not as a bug but as a structural consequence: behaviorism made real at scale produces outputs that fit the statistical patterns of its training data, and in the domain of law, the training data contains vastly more real cases than invented ones—until the model is asked about a case outside its data, whereupon it generates a statistically plausible but factually nonexistent one, with the same fluency it generates everything else. There is no competence behind the performance. There is no knowledge being expressed, so there is nothing to keep the expression honest.

Where Judea Pearl places the machines on his three-rung ladder and shows they occupy only the first, Chomsky identifies the diagnostic differently: the machines can be trained equally well on possible and impossible human languages, on systems no child could acquire, and they model both with equal facility. A device that learns the impossible as readily as the possible has not discovered what makes language humanly possible—it has dissolved every distinction in data. Pearl’s instrument and Chomsky’s measure converge on the same conclusion from different foundations: what the machines lack is not more data or more parameters but a different kind of knowledge altogether.

His political work enters the cycle’s concerns as well. The man who wrote the anatomy of manufactured consent is still alive to ask what a persuasion system with no human writers might do. Language models generate text at a scale no human newsroom could match, with no author who could be held responsible and no conscience that could object. Chomsky’s propaganda model described human beings operating within structural constraints to produce a narrowed discourse; the machines introduce the possibility of removing the human writers while retaining the constraint-structure—and with it the last friction of human scruple.

Origin

Born in Philadelphia in 1928, Chomsky received his doctorate from the University of Pennsylvania under Zellig Harris and joined the MIT faculty in 1955, where he has spent his entire career. His early work in transformational grammar—the proposal that the sentences speakers understand and produce are generated by abstract rules operating on hierarchical structures rather than by association and habit—transformed linguistics from a taxonomic discipline into a cognitive science. The 1957 book Syntactic Structures and the 1965 Aspects of the Theory of Syntax established the theoretical foundations. But the decisive intellectual event of his career was the 1959 review of Skinner’s Verbal Behavior, which demonstrated, through close analysis of the behaviorist’s own categories, that those categories could explain verbal behavior only by becoming vacuous—applicable to any outcome and therefore explanatory of none.

The demonstration cleared the space for the nativist alternative: the proposal that children acquire language not by conditioning but by growing a faculty, much as they grow organs, with a specific internal structure that experience triggers rather than creates. The poverty of the stimulus—the gap between the fragmentary, error-ridden input available to a child and the rich, rule-governed competence the child nonetheless achieves—is the empirical anchor. The child hears only positive evidence, never systematic correction, and yet converges on the same precise grammar every other child of the same language converges on, including on rules that the evidence available to any individual child could never single out. The grammar must have been partly given in advance, by a dedicated faculty present at birth and common to the species.

Alongside the linguistic project, and always in tension with it in the culture’s reception, Chomsky has pursued the anatomy of political power. American Power and the New Mandarins (1969), Manufacturing Consent (with Edward Herman, 1988), and dozens of subsequent works apply a structural analysis to the production of public opinion—showing how consent is manufactured not by conspiracy but by the architecture of the information environment, through filters that narrow the expressible without appearing to constrain it. The two projects are less separate than they appear: both concern the relationship between structure and freedom, between what a system enables and what it forecloses.

Key Ideas

The poverty of the stimulus. Children acquire linguistic competence far richer than the available evidence could justify. The input is finite, fragmentary, and riddled with error. The output is an infinite, rule-governed system, acquired without explicit instruction, within a narrow developmental window. The gap between input and output is the space where innate structure must live—structure the environment did not supply and could not supply, because the data underdetermine the grammar in ways that the child’s unfailing convergence on the right result exposes. This is the empirical argument that behaviorism could not answer in 1959 and that large language models do not answer now: they avoid the problem by being given what no child is given, a corpus larger than any human could process, and their success therefore says nothing about the child’s achievement.

Competence versus performance. Competence is the speaker’s underlying knowledge of the language—the grammatical system, internalized and largely unconscious, that determines which strings are sentences and what they mean. Performance is what the speaker actually produces under the messy conditions of real use, shaped by competence but distorted by memory limits, fatigue, distraction, and false starts. The language model is pure performance: it produces strings, often fluent and grammatical, with no underlying competence in this sense. There is no grammar it possesses and consults; there is only a trained disposition to continue text in ways that resemble its training. Where the human speaker’s performance is the imperfect expression of an underlying competence, the model’s output is performance without anything behind it—surface all the way down.

Engineering is not science. The most characteristic and most easily misread of Chomsky’s charges against modern AI is that it contributes nothing to the science of language—that its engineering triumphs are not scientific achievements, however impressive. A system that predicts linguistic outputs with extraordinary accuracy without representing the structure that explains why those outputs take the forms they do is not a theory of language. It is an extraordinarily sophisticated lookup table. The distinction matters because it determines what we have learned: a field that mistakes fluent production for genuine understanding has abandoned the question it was supposed to answer.

The propaganda model. With Edward Herman, Chomsky proposed a structural account of how public discourse is narrowed in formally free societies through a set of filters—concentrated ownership, advertising dependence, reliance on official sources, organized criticism, and prevailing ideology. The filters produce consistent results without conspiracy, because they are properties of the information environment rather than of any individual’s intent. AI’s ability to generate persuasive text at unlimited scale with no human author who could be held responsible and no conscience that could object extends the propaganda model into a new register—not by changing its logic but by removing its last friction.

The mystery of creative use. The deepest concept in Chomsky’s philosophy of language is the one he approaches with the most humility: the creative aspect of language use, the fact that humans deploy language appropriately to situations without being compelled by them. Stimulus-free but situation-appropriate: we are incited to speak by circumstances but not determined by them. The language model’s output is the opposite: compelled by its inputs and weights, it produces text that appears situation-appropriate by being statistically determined. Chomsky does not claim the machine lacks a soul; he claims it lacks the specific, characterizable freedom that makes human language use what it is—and that this is a scientific finding, not a metaphysical one.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Related Entries

Further Reading