WORK

WordNet

The computational lexical database Miller led the development of at Princeton from 1985 onward — a large-scale semantic network that organized English vocabulary into synonym sets linked by relations, and that became foundational infrastructure for natural language processing and, eventually, for the training of large language models.

WordNet is a large-scale computational lexical database developed at Princeton University under George Miller's leadership beginning in 1985. Its core innovation was organizing English vocabulary not as a traditional alphabetical dictionary but as a network of synonym sets (synsets) linked by semantic relations — hyponymy, meronymy, antonymy, and others. Each synset represents a distinct concept; the links between synsets represent the conceptual structure of the lexicon. WordNet grew into a resource of over 117,000 synsets covering nouns, verbs, adjectives, and adverbs, and became foundational infrastructure for natural language processing research across decades. Its design reflected Miller's lifelong interest in how humans organize meaning — the same interest that had drawn him from memory research to language research in the 1960s. WordNet was, in a sense, the computational instantiation of the chunking hierarchy Miller had been theorizing for three decades: a systematic mapping of the compressed categories through which human minds organize linguistic meaning.

In the AI Story

Hedcut illustration for WordNet — *WordNet*

WordNet's development occupied Miller and his Princeton colleagues for nearly three decades. The project involved dozens of lexicographers, linguists, and computer scientists, and its releases (from version 1.0 in 1990 through subsequent expansions) became standard reference resources in computational linguistics. The database was made freely available, which contributed enormously to its adoption across academic and industrial NLP research.

The theoretical significance of WordNet extends beyond its utility as a resource. Miller's design choices embedded specific commitments about the structure of meaning — that concepts cluster into synsets, that relations between concepts are finite and enumerable, that semantic structure can be captured through explicit hierarchies. These commitments shaped how a generation of NLP researchers thought about language, and they carried Miller's chunking framework into computational practice.

The connection to large language models is both ironic and illuminating. Contemporary LLMs do not use WordNet directly; they learn semantic structure implicitly from massive text corpora using statistical methods rather than the explicit relational structure WordNet encodes. But WordNet was instrumental in the decades of NLP research that preceded LLMs, providing ground-truth semantic structure against which algorithms could be trained and evaluated. Miller's project contributed, indirectly, to the infrastructure of the technology whose cognitive implications his earlier framework now illuminates.

The final irony is that WordNet embodied the opposite design philosophy from the statistical models that now dominate NLP. WordNet made semantic structure explicit, decomposable, and human-interpretable. LLMs make semantic structure implicit, dense, and opaque. The transition from WordNet's architecture to LLM architecture is, in Miller's own terms, a transition from transparent chunks to dense but opaque compressions — the very distinction this book argues is decisive for the cognitive consequences of AI.

Origin

The WordNet project began at Princeton in 1985 with funding from the Office of Naval Research and grew to involve a large team of researchers. Miller led the project as co-director with Christiane Fellbaum, who succeeded him as principal investigator.

The theoretical foundations drew on Miller's earlier work on semantic memory (with Philip Johnson-Laird in Language and Perception, 1976) and on contemporary linguistic theory, particularly lexical semantics and structuralist traditions.

Key Ideas

Synsets as concept units. Groups of words sharing a common meaning function as the basic building blocks, rather than individual words.

Explicit semantic relations. Hypernym, hyponym, meronym, antonym, and other relations between synsets create a navigable graph of conceptual structure.

Human-interpretable architecture. The database's structure was designed to be inspectable by researchers, enabling the study of how humans might organize semantic information.

Foundational for pre-LLM NLP. For roughly two decades, WordNet was indispensable infrastructure for computational linguistics, supporting tasks from word sense disambiguation to machine translation.

Transitioned past by statistical methods. Modern LLMs learn semantic structure implicitly, embedding it in billions of parameters rather than in explicit relations — a shift from transparent to opaque compression.

Appears in the Orange Pill Cycle

George Miller — On AI

WordNet

In the AI Story

Origin

Key Ideas

Appears in the Orange Pill Cycle

Related Entries

Further reading