PERSON

Stephen Jay Gould

The paleontologist who spent his career proving that evolution is a copiously branching bush, not a ladder—and who thereby supplied the most rigorous available argument against the myth that the AI trajectory is inevitable, predetermined, and must arrive at its announced destination.

Stephen Jay Gould (1941–2002) was a paleontologist, evolutionary biologist, and essayist who did more than almost anyone in the twentieth century to clarify what evolution actually is—and what it is not. It is not a ladder, not a march of progress, not a predetermined ascent from primitive to advanced. It is a copiously branching bush, continually pruned by extinction, with no main trunk and no predetermined summit. The organisms that dominate today are not the best organisms; they are the organisms that happened to survive the specific contingent events—asteroid impacts, climate shifts, the accidental survival of one predator over another—that characterized their moment. Replay the tape of life, Gould argued, and you get a different world: humans would almost certainly not evolve, because the specific sequence of contingent events that produced us is unrepeatable. This thesis, which he spent his career elaborating against resistance from adaptationist colleagues who saw natural selection as an optimizer marching toward perfection, transfers to the AI moment with force that Gould could not have anticipated. The AI discourse has constructed its own ladder with remarkable speed: vacuum tubes to transistors to neural networks to AGI, each step leading naturally to the next, the summit inevitable. Punctuated equilibrium, spandrels, contingency, the Mismeasure of Man—Gould’s four great frameworks are four instruments for seeing through the ladder narrative and asking, with Gouldian precision: what is actually being selected, by what specific contingent pressures, and what has been pruned away in the story we tell about how we arrived here.

In the [YOU] on AI Field Guide

The cycle that begins with [YOU] on AI asks what it would mean to see the machine clearly—to refuse both the narcotic of hype and the paralysis of fear. Gould is the cycle’s primary instrument for seeing through a third distortion: the myth of inevitability. The ladder narrative of technological progress tells the builder to relax, that the technology knows where it is going. Gould’s bush tells the builder to wake up, because where it goes depends on what is done next. The Cambrian explosion produced dozens of viable body plans; most went extinct not because they were inferior but because contingent events favored certain lineages over others. The AI moment is another such explosion: a rapid diversification of architectural forms and capabilities, most of which will go extinct, not because they were objectively worse but because the specific ecological conditions—regulatory, economic, institutional, cultural—created by the specific choices of the specific humans alive at this moment will favor some and eliminate others.

His concept of spandrels—structural byproducts of architecture rather than designed features—is the most diagnostically useful of his frameworks for the AI present. Hallucinations in language models are treated in popular discourse as bugs to be fixed. Gould’s framework reveals them as spandrels: the necessary geometric consequence of the same architectural property that produces fluency. The statistical machinery that makes coherent text possible also makes confident falsehood possible, because the training process selects for the pattern of confident assertion, which correlates with factual accuracy across most of the training data but is not identical to it. Treating hallucinations as bugs implies they can be eliminated without altering the architecture; treating them as spandrels clarifies why the reduction is real but the elimination is not.

The Mismeasure of Man, his most politically charged work, applies directly to AI benchmarks. Morton’s skull measurements were not fraudulent; they were careful measurements whose sample selection, methodology, and interpretation were all unconsciously shaped by the hierarchy the measurements were supposed to test. AI benchmark scores are, in exactly this structural sense, careful measurements whose design, sample, and interpretation are all shaped by the conception of ‘intelligence’ or ‘capability’ the benchmark was meant to capture. The reification—treating the benchmark score as a measure of a substance called intelligence rather than a measure of performance on a specific test under specific conditions—reproduces Morton’s error at scale. The cycle’s emphasis on seeing the machine clearly includes seeing clearly what the benchmarks measure and what they do not.

Gould stands in the cycle’s gallery alongside Judea Pearl, who argues from the logic of information that what current systems can do falls precisely short of causal reasoning; alongside Stephen Wolfram, who argues from the structure of computation that the opacity and irreducibility of powerful systems is not a deficiency but a structural consequence; and alongside Stephen Hawking, who argues from physics that thresholds of no return are real, invisible, and approach without warning. Gould’s contribution is the argument about contingency: not what the machine is, but that what it is was not inevitable, that alternative trajectories existed and were suppressed by specific choices, and that the future trajectories are still being chosen right now.

Origin

Gould was born in New York City in 1941, educated at Antioch College and Columbia University, and spent his career at Harvard, where he held the Alexander Agassiz Professorship of Zoology and became one of the most widely read science essayists of his century. His research focus was the fossil record of land snails in the Bahamas, Caribbean, and Bermuda—meticulous, empirical, small-scale work that gave him the patience for fine-grained observation and the skepticism of grand narratives that his theoretical contributions required. Three hundred essays in Natural History magazine over twenty-seven years made him the dominant voice of science popularization in the late twentieth century.

His theoretical contributions cluster around four interlocking arguments. With Niles Eldredge in 1972, he proposed punctuated equilibrium: that the fossil record’s pattern of long stasis interrupted by rapid bursts of change is not a gap in the data but the actual pattern of evolution, driven by the stabilizing constraints of large populations and the released variation of peripherally isolated small ones. With Richard Lewontin in 1979, he argued for the reality of spandrels—features arising as structural byproducts of other adaptations rather than as selected adaptations themselves—against the adaptationist programme that treated every feature of every organism as an optimized solution to a design problem. In The Mismeasure of Man (1981), he excavated the history of intelligence measurement and showed how the reification of the abstraction into a measurable substance had repeatedly served to confirm existing hierarchies rather than reveal natural ones. And in Wonderful Life (1989), he developed the full argument for contingency: that the history of life is shaped by unrepeatable accidents, that replaying the tape from any prior point would produce a fundamentally different world, and that the ladder narrative of progress toward humanity is a flattering fiction projected onto a bush.

He died in 2002, before the deep learning revolution that would vindicate and extend his framework into the territory he could not have anticipated. The neural network winters, the extinction of the LISP machines, the contingent survival of the transformer architecture in a specific institutional context at a specific moment—each is a case study in Gouldian dynamics that he would have recognized immediately and would have deployed with characteristic precision.

Key Ideas

The bush, not the ladder. Evolution is not a march toward complexity or intelligence; it is a copiously branching bush with no main trunk. The human lineage is one twig on one branch. The lineages that dominate—bacteria, by biomass, longevity, and ecological dominance—are not the ones the ladder narrative celebrates. The AI discourse has reproduced the ladder with disturbing fidelity: an ascending sequence from early computation to AGI, each step presented as the natural preparation for the next. Gould’s framework asks what the ladder is concealing—specifically, what branches have been pruned and what body plans went extinct not because they were inferior but because specific contingent conditions selected against them.

Punctuated equilibrium and the AI moment. The long periods of apparent stasis in AI development—the neural network winters, the decades of incremental symbolic AI before the transformer breakthrough—are not failures. They are the condition under which variation accumulates inside developmental constraints, waiting for a disruption sufficient to release it. The rapid transitions—the punctuations—express variation that had been accumulating without anyone knowing. The collapse of the imagination-to-artifact ratio that the cycle describes is punctuated equilibrium in the history of technology: decades of constraint releasing in months. The question the concept forces is whether the punctuation is consuming the conditions that produced the variation it is now releasing.

Spandrels and hallucinations. Gould and Lewontin’s concept of the spandrel—a structural byproduct of other architecture, present not because it serves a purpose but because the system that serves other purposes necessarily produces it—is the most precise available description of large language model hallucinations. The fluency that makes models useful and the confident falsehood that makes them dangerous are the same architectural property viewed from two angles. Eliminating the hallucination entirely would require altering the architecture that produces the fluency, because both are consequences of the same statistical mechanism. The spandrel framework says: understand what the feature is structurally for before you decide whether it is a bug to be eliminated or a byproduct to be managed.

The Mismeasure of Man and benchmark reification. Gould’s exposure of how unconscious hierarchy-confirming assumptions shape apparently objective measurement—from Morton’s skull sizes to IQ scores—applies directly to AI benchmarks. A score on MMLU or HumanEval is a real measurement of real performance on specific tasks under specific conditions. It is not a measurement of a substance called intelligence or capability. The conversion of the score into a ranking on a single axis of intelligence reproduces Morton’s method in silicon: a multidimensional, context-dependent phenomenon compressed into a number, the number reified as a substance, the substance used to construct a hierarchy that confirms the assumptions embedded in the original measurement design.

Replay the tape. Gould’s most famous thought experiment: wind evolution back to the Cambrian and let it run forward again. The result would be a fundamentally different world. Applied to AI: wind back to 2017 and imagine the Vaswani team’s paper is never written, or is written differently, or the GPU economics are slightly different, or DARPA priorities in the 1970s maintained minimal funding for connectionist research. Every node is a contingent branch point, and different paths from any of them would produce systems with different capabilities, different limitations, different social consequences. The specific AI that exists today is not the inevitable product of technological destiny. It is the contingent survivor of a specific sequence of accidents, and the alternatives that went extinct remain alternatives that future choices might recover.

Explore more

Browse the full You On AI Field Guide — over 8,500 entries