Stephen Jay Gould — On AI
Contents
Cover Foreword About Chapter 1: The Myth of the Ladder Chapter 2: The Fossil Record of Technology Chapter 3: Punctuated Equilibrium and the Winter of 2025 Chapter 4: Spandrels, Hallucinations, and the Adaptationist Fallacy Chapter 5: The Mismeasure of the Machine Chapter 6: Replaying the Tape of Technology Chapter 7: Exaptation and the Unintended Future Chapter 8: The Expanding Bush of Capability Chapter 9: What Darwin Did Not See Chapter 10: What the River Does Not Determine Epilogue Back Cover
Stephen Jay Gould Cover

Stephen Jay Gould

On AI
A Simulation of Thought by Opus 4.6 · Part of the Orange Pill Cycle
A Note to the Reader: This text was not written or endorsed by Stephen Jay Gould. It is an attempt by Opus 4.6 to simulate Stephen Jay Gould's pattern of thought in order to reflect on the transformation that AI represents for human creativity, work, and meaning.

Foreword

By Edo Segal

The branch I almost missed was the one I was standing on.

For months I had been telling the story of AI as a river — intelligence flowing for 13.8 billion years, accumulating complexity, accelerating toward this moment. The river metaphor felt true. It still does. But there was something it couldn't explain, something that nagged the way Claude's misuse of Deleuze nagged before I checked the reference.

The river has direction. It flows downhill. And every time I described it that way, I was smuggling in an assumption so deep I couldn't see it operating: that the specific channel the river carved — transformers, large language models, the particular orange pill moment of December 2025 — was the channel it was always going to carve. That what happened was what had to happen.

Stephen Jay Gould spent forty years dismantling exactly that assumption, not in technology but in biology. His argument was simple to state and devastating in its implications: the history of life is not a ladder climbing toward us. It is a wildly branching bush, pruned by accident, shaped by contingency, with no predetermined summit. Replay the tape from the Cambrian explosion, and you get a different world. Not a slightly different world. A fundamentally different one.

When I encountered Gould's framework applied to AI, something shifted in how I understood my own book. If the specific form AI has taken — the transformer architecture, the conversational interface, the capabilities and the hallucinations — is contingent rather than inevitable, then the choices being made right now are not refinements of a predetermined trajectory. They are the trajectory. Every funding decision, every regulatory framework, every educational reform, every parental conversation about what matters is carving the channel through which the river flows. Different choices would carve a different channel. The future is not arriving. It is being built.

That changes everything about the weight of this moment.

Gould also demolished something I hadn't realized I was doing: measuring AI on a single axis of "intelligence" and treating the score as though it named a real substance. His work on the mismeasure of human intelligence — showing how bias hides inside seemingly objective metrics — maps onto the AI benchmark ecosystem with uncomfortable precision. The leaderboard is not a window onto capability. It is a lens, and the lens has a shape, and the shape determines what you see.

This book is another lens. It will not resolve the tension between momentum and contingency. It will sharpen it. And that sharpening is exactly what this moment demands.

— Edo Segal ^ Opus 4.6

About Stephen Jay Gould

1941-2002

Stephen Jay Gould (1941–2002) was an American paleontologist, evolutionary biologist, and historian of science who spent his career at Harvard University, where he taught geology, biology, and the history of science for over three decades. With Niles Eldredge, he developed the theory of punctuated equilibrium, which proposed that evolutionary change occurs not gradually but in rapid bursts separated by long periods of stasis — a framework that challenged the dominant Darwinian assumption of slow, steady modification. His landmark book *The Mismeasure of Man* (1981) exposed the scientific racism embedded in intelligence testing, demonstrating how cultural bias shapes supposedly objective measurement. With Richard Lewontin, he introduced the concept of "spandrels" — architectural byproducts mistaken for designed features — as a corrective to adaptationist thinking in evolutionary biology. His book *Wonderful Life* (1989) used the fossils of the Burgess Shale to argue for the radical contingency of evolutionary history, proposing his famous thought experiment: replay the tape of life, and the outcome would be utterly different. Across more than twenty books and three hundred consecutive monthly essays in *Natural History* magazine, Gould became one of the most widely read scientists of the twentieth century, celebrated for his ability to connect the specific — a single fossil, a batting average, an architectural detail — to the deepest questions about how complex systems change over time.

Chapter 1: The Myth of the Ladder

For more than two thousand years, Western civilization organized the living world on a ladder. The Great Chain of Beingscala naturae — placed minerals at the bottom, then plants, then animals, then humans, then angels, then God. Each rung occupied its proper station. Each station implied that the rungs below existed for the sake of the rungs above. The ladder was not merely a classification scheme. It was a cosmology. It told you where you stood in the universe, and it told you — this is the crucial part — that the universe had a direction. Upward. Toward complexity, toward consciousness, toward the divine.

The ladder survived the Scientific Revolution almost intact. Linnaeus baptized it in Latin. The early evolutionists dressed it in Darwinian language without altering its essential geometry. Ernst Haeckel drew his famous tree of life in 1866, but the tree was suspiciously ladder-shaped, with a single trunk running from amoeba to German professor in an unbroken vertical line. The popular understanding of evolution has never fully escaped Haeckel's image. Ask anyone on the street to draw evolution, and they will draw a line from fish to amphibian to reptile to mammal to ape to human, each form replacing the last in an orderly ascent. The famous "March of Progress" illustration — the one showing a knuckle-walking ape gradually straightening into an upright modern human — has become the single most recognizable image in the history of science, despite being wrong in virtually every detail it implies.

Gould spent decades demonstrating that this image is not merely inaccurate but actively pernicious. Evolution is not a ladder. It is a copiously branching bush, continually pruned by the grim reaper of extinction, with no main trunk and no predetermined summit. The lineage that produced Homo sapiens is one twig on one branch of one limb of an unimaginably complex bush whose topology bears no resemblance to a ladder whatsoever. Humans are not the purpose of evolution. Humans are not even the most successful product of evolution, if success is measured by biomass, longevity, or ecological dominance. Bacteria hold every one of those records and have held them for three and a half billion years. The bacterium E. coli has been here since before the first multicellular organism existed, and it will be here long after the last human has gone. If the history of life has a protagonist, it is not Homo sapiens. It is the prokaryote, and the prokaryote has never shown the slightest inclination to climb a ladder toward anything.

The myth of the ladder persists because it flatters the creatures telling the story. Humans occupy the top rung. The ladder confirms what every culture has wanted to believe: that the universe was organized with us in mind, that the arrow of time points toward our particular kind of complexity, that we are the destination toward which four billion years of biological history has been traveling. Gould recognized this flattery for what it was — not an observation about nature, but a projection onto nature of a very human desire for significance. The ladder is not a description of evolution. It is a monument to human vanity dressed in scientific language.

The AI discourse has constructed its own ladder with remarkable speed and disturbing precision. The progression is always presented in the same ascending order: vacuum tubes, transistors, integrated circuits, microprocessors, personal computers, the internet, smartphones, machine learning, deep learning, large language models, artificial general intelligence. Each step leads naturally to the next. Each step represents an advance in capability, in complexity, in something the discourse confidently calls "intelligence." The ladder of computing is presented as though the transistor existed for the sake of the microprocessor, the microprocessor for the sake of the personal computer, the personal computer for the sake of the neural network, and the neural network for the sake of whatever comes next. The summit of the ladder is usually AGI — artificial general intelligence — though the summit keeps receding, as summits do, every time someone claims to have reached it.

The parallel to biological progressivism is not merely rhetorical. It is structural. In both cases, a genuinely complex, genuinely branching, genuinely contingent history is compressed into a linear narrative that makes the present look inevitable and the future look predetermined. In both cases, the linear narrative serves the interests of the creatures telling it. In biology, the ladder flatters humans. In technology, the ladder flatters the builders, the investors, the institutions that have committed enormous resources to a specific trajectory and need to believe that the trajectory is destined to arrive at its intended destination.

Consider what the ladder conceals. It conceals the LISP machines of the 1980s, elegant computational architectures that represented a fundamentally different approach to artificial intelligence — symbolic reasoning rather than statistical learning — and that were commercially and intellectually vibrant for a decade before the market selected against them. Not because they were wrong. Not because statistical learning was objectively superior. The selection was driven by specific, contingent economic conditions: the price-performance ratio of general-purpose hardware improved faster than the price-performance ratio of specialized LISP processors, and the venture capital environment of the late 1980s punished specialized hardware companies more severely than it punished software startups. LISP machines were a body plan that went extinct not because of inherent inferiority but because the ecological conditions shifted in a direction that favored a different approach. Had the economics of chip fabrication followed a slightly different trajectory, the AI landscape today might be dominated by symbolic reasoning systems whose capabilities and limitations would bear little resemblance to the large language models the discourse now treats as inevitable.

The ladder conceals the neural network winter of the 1970s through the early 1990s, when the entire approach that would eventually produce modern AI — connectionist architectures, statistical learning from data, the mathematics of backpropagation — was abandoned by the mainstream research community. Not because the mathematics was wrong. The mathematics was sound. The approach was abandoned because the computational resources it required did not yet exist, and the funding environment, shaped by the specific priorities of DARPA and the specific political dynamics of the Cold War, favored approaches that could demonstrate near-term practical applications. Twenty years of potential development was lost — not to a failure of theory but to the contingencies of funding politics. If the neural network winter had lasted another decade, or if it had never occurred at all, the current AI landscape would be unrecognizably different.

The ladder conceals the Xerox Alto, the machine that invented the graphical user interface, the mouse, the ethernet network, and the what-you-see-is-what-you-get document editor — all in 1973, a full decade before the Macintosh and two decades before the World Wide Web. The Alto was technically brilliant and commercially extinct. Its innovations were, in Gould's terminology, exapted by Apple and Microsoft, stripped from one context and inserted into another, but the machine itself died in its ecological niche. The Alto's extinction had nothing to do with the quality of its innovations and everything to do with Xerox's specific institutional culture, which could not convert research brilliance into commercial products. Had Xerox possessed a different management structure — a purely contingent feature of corporate history — the personal computer revolution might have been a Xerox revolution, and the downstream consequences for AI would have been correspondingly different.

Each of these examples is a branch on the bush of technological evolution that the ladder narrative prunes away. The pruning is not innocent. When branches are removed and only the trunk remains, the viewer concludes that the trunk was inevitable. The trunk was not inevitable. The trunk is a retrospective artifact, produced by ignoring the branches that did not survive.

Segal's Orange Pill describes the AI moment as a phase transition — "the way water becomes ice: the same substance, suddenly organized according to different rules." Gould's framework accepts the phase transition but rejects its most comforting implication. A phase transition in water is reversible and governed by universal physical laws. The phase transitions in evolutionary history — the Cambrian explosion, the end-Permian extinction, the K-T boundary event — were irreversible, unpredictable, and determined by contingent events that no amount of prior knowledge could have forecast. The specific organisms that survived the K-T asteroid impact were not the "best" organisms. They were the organisms that happened to possess features — small body size, burrowing habits, dietary flexibility — that were irrelevant to their fitness in the pre-impact world but decisive in the post-impact world. The survivors were lucky, not superior.

Segal's river of intelligence, flowing for 13.8 billion years "from atoms to algorithms, from hydrogen to humanity to whatever comes next," carries within it the implicit geometry of the ladder. The river has a direction: toward greater complexity, greater connection, greater capability. Gould would accept the river's reality while contesting this directional claim. A river flows downhill, constrained by geology and redirected by accident. The specific channels it carves depend on contingent features of the landscape — features that could easily have been otherwise. The fact that the river has arrived at a particular bend does not mean the bend was the river's destination. It means the landscape, at this particular moment, happens to channel the water in this particular direction.

This distinction matters enormously — not as an abstract philosophical point but as a practical guide to action. The myth of the ladder produces a specific and dangerous form of complacency. If the trajectory is determined, then individual choices are decorative. If AGI is inevitable, then the question of whether to build it, how to build it, and who benefits from its construction are moot. The arrow is already in flight. The target is already fixed. The only rational response is to position yourself to capture the gains.

Gould's contingency thesis demolishes this complacency. If the trajectory is not determined — if the specific form AI takes depends on specific, contingent, still-being-made decisions — then individual choices are not decorative. They are constitutive. The future is not arriving. It is being constructed, right now, by the specific humans who happen to be positioned at the decision points, and different decisions would construct a different future.

The myth of the ladder tells the builder: relax, the technology knows where it is going. Gould's bush tells the builder: wake up, because where it goes depends on what you do next. The Cambrian explosion produced dozens of viable body plans. Most of them went extinct. Not because they were inferior, but because contingent events — an asteroid, a climate shift, the accidental survival of one predator rather than another — favored some lineages over others. The AI moment is another Cambrian explosion: a rapid diversification of forms and capabilities, most of which will go extinct. The ones that survive will not be the "best" in any objective sense. They will be the ones that happen to fit the specific ecological conditions produced by the specific choices — regulatory, economic, educational, cultural — that the specific humans alive at this moment happen to make.

The ladder says the future is written. The bush says the future is being written. Gould spent his career demonstrating, with the meticulous attention to evidence that characterized everything he did, that the bush is the accurate description and the ladder is the comforting fiction. The evidence from the history of technology supports his conclusion with the same force as the evidence from the history of life. The branches that were pruned — the LISP machines, the Xerox Alto, the connectionist approaches abandoned for two decades — are the technological equivalent of the Burgess Shale fauna: body plans that were viable, that represented genuine alternatives to the forms that eventually dominated, and that went extinct for reasons that had nothing to do with inherent fitness and everything to do with the specific, contingent, unrepeatable conditions of their moment.

The question that the myth of the ladder prevents us from asking is the most important question of the AI moment: What is being lost? Not in the sentimental sense that Segal's elegists mourn the passing of craft knowledge — though that mourning is legitimate — but in the structural sense that every selection event forecloses alternatives. Every funding decision that favors one approach eliminates another. Every regulatory framework that shapes the development environment selects for some lineages and against others. The organisms that emerge from this selection are not the best possible organisms. They are the organisms that survived this particular selection event — and the alternatives that did not survive remain alternatives that someone, somewhere, at some future moment, might wish had been preserved.

The ladder says there were no alternatives. The bush says they were everywhere. Gould's entire life's work was an argument for seeing the bush. The AI moment demands, with an urgency Gould could not have anticipated but would immediately have recognized, that the same argument be applied to the most consequential technological transition in human history.

---

Chapter 2: The Fossil Record of Technology

Gould's method was always to begin with the specific — the particular specimen, the humble fossil, the overlooked datum that reveals what grand theory conceals. His natural history essays, hundreds of them over three decades, characteristically opened with an oddity: the panda's "thumb" (which is not a thumb at all but an enlarged wrist bone), the flamingo's upside-down feeding posture, the male nipple, the parasitic wasp that lays its eggs inside living caterpillars. Each oddity served as the entry point into a larger argument — about constraint, about historical contingency, about the difference between a world designed for a purpose and a world shaped by the accumulation of historical accidents.

The history of computing, read with Gouldian attention, is as rich in instructive oddities as any fossil deposit. The technology industry's preferred narrative is Haeckel's tree rendered in silicon: a single trunk running from Babbage's Analytical Engine through Turing's theoretical machine through the ENIAC through the transistor through the microprocessor through the personal computer through the internet through the smartphone through deep learning to the large language model, each step leading inevitably to the next, each transition representing "progress" toward a destination that the narrative treats as self-evidently desirable. This is the ladder of technology, and it is as misleading as the ladder of biology — for exactly the same reasons.

The fossil record of technology, like the fossil record of life, is full of body plans that went extinct. Their extinction tells us something that the ladder narrative systematically suppresses: the specific forms that survived were not the only viable forms, and the reasons for their survival were frequently contingent rather than intrinsic.

Consider the QWERTY keyboard. The standard arrangement of letters on the English-language keyboard was designed in the 1870s by Christopher Latham Sholes, and its specific layout was shaped by the mechanical constraints of the Sholes and Glidden typewriter — particularly the need to prevent adjacent type bars from jamming when struck in rapid succession. The layout separated commonly paired letters to slow typists just enough to prevent mechanical jams. The mechanical constraint that produced QWERTY disappeared with the electric typewriter in the 1930s and became entirely irrelevant with the computer keyboard in the 1970s. The Dvorak layout, designed in 1936 specifically to optimize typing speed and reduce finger travel, has been available for nearly a century. By every ergonomic and efficiency metric, Dvorak is superior to QWERTY. QWERTY persists. It persists not because it is better but because the installed base of QWERTY-trained typists created a path dependency that no amount of superior design could overcome. The organism was locked in by its own history.

Gould would have recognized this instantly as an example of what evolutionary biologists call phylogenetic constraint — the way an organism's evolutionary history limits the range of forms available to it, regardless of what "optimal design" might suggest. The panda's thumb is a wrist bone because the panda's ancestors did not have a free digit to convert into an opposable thumb. The available developmental material constrained the available solution. The QWERTY keyboard is a mechanical-era artifact because the computing industry's history did not include a moment of sufficient disruption to overcome the path dependency created by a century of trained muscle memory.

The lesson is that technological evolution, like biological evolution, does not optimize from a blank slate. It inherits. It is constrained by its own history. The forms available at any given moment are determined not by what is theoretically optimal but by what the accumulated history of prior forms makes accessible. This is not a marginal observation. It is the central insight of Gould's entire theoretical framework, and it applies to AI with a force that the technology industry's progressivist narrative cannot accommodate.

The neural network winter provides the most consequential example. The mathematics of artificial neural networks and backpropagation were substantially developed by the late 1960s. Frank Rosenblatt's perceptron, introduced in 1958, demonstrated that simple networks of artificial neurons could learn to classify inputs. The approach was promising, theoretically sound, and spectacularly overhyped by its early advocates — Rosenblatt himself predicted that perceptrons would eventually "be conscious of their existence" — which set it up perfectly for the backlash that followed.

Marvin Minsky and Seymour Papert's 1969 book Perceptrons demonstrated that single-layer networks could not solve certain classes of problems (specifically, the XOR function), and the demonstration was widely — and incorrectly — interpreted as proof that neural networks were a dead end. Minsky and Papert were careful to limit their claims to single-layer networks. The research community was not careful in its reading. Funding dried up. Graduate students were warned away from connectionist approaches. The winter set in.

The winter lasted, with some fluctuations, until the mid-2000s, when the convergence of three contingent developments — massive datasets, GPU computing power originally developed for the video game industry, and Geoffrey Hinton's group's persistent work on deep architectures — created the conditions for a renaissance. But the two decades of winter were not empty time. They were time during which the specific approach that would eventually dominate AI was starved of resources, talent, and institutional support. Alternative approaches — symbolic AI, expert systems, Bayesian networks — received the funding and attention that neural networks did not. The research community that emerged from the winter was shaped by the winter itself, carrying specific institutional scars, specific theoretical commitments, specific habits of thought that a community without the winter would not possess.

Replay the tape. Imagine a world in which Minsky and Papert's book was read more carefully — in which the distinction between single-layer and multi-layer networks was preserved in the popular interpretation. Imagine that the funding environment of the 1970s, shaped by different DARPA priorities or different political pressures, maintained support for connectionist research at even a modest level. In that world, the deep learning revolution might have occurred in the 1980s rather than the 2010s. The AI landscape of 2025 would be three decades more mature. The specific capabilities and specific limitations of the systems we now use would be fundamentally different. The "orange pill" moment Segal describes — the winter when the machine learned to speak human language — might have occurred when Segal was a teenager rather than a man in his fifties. The entire trajectory of human-AI interaction would have unfolded differently.

None of this happened because none of the contingent preconditions were met. The winter was not inevitable. It was a product of specific intellectual errors (the misreading of Minsky and Papert), specific institutional dynamics (funding agencies' preference for near-term results), and specific personality conflicts (the rivalry between symbolic AI and connectionist camps that made collaboration difficult and defection dangerous). Any of these contingencies might have been otherwise. The trajectory that emerged was one of many possible trajectories, and the one we inhabit was selected not by the intrinsic superiority of its destination but by the accumulated weight of historical accidents.

The Xerox PARC story extends the lesson to the relationship between invention and survival. The Alto, developed in 1973, contained virtually every element of the modern personal computing experience: graphical user interface, mouse-driven interaction, WYSIWYG document editing, Ethernet networking, object-oriented programming. The technology was, by any measure, a decade ahead of anything commercially available. The Alto was also, by any commercial measure, an extinction event. Xerox built approximately two thousand Altos, deployed them internally, demonstrated them to anyone who visited PARC, and then — through a specific failure of institutional imagination that has been documented so thoroughly it has become a cautionary fable in business schools — failed to commercialize any of it.

Steve Jobs visited PARC in 1979 and exapted the innovations he saw. The Macintosh, introduced in 1984, contained PARC's conceptual DNA repackaged in a commercially viable organism. But the Mac was not the Alto. The specific design choices Apple made — the single-button mouse, the closed hardware architecture, the specific aesthetic sensibility that Jobs imposed — produced a different creature, adapted to a different ecological niche, with different capabilities and different limitations. The graphical user interface survived the transition from Xerox to Apple. The specific form of the graphical user interface was transformed by the transition, because the institutional environment, the commercial pressures, and the design philosophies of Xerox and Apple were fundamentally different organisms shaped by fundamentally different histories.

What does the fossil record of technology tell those attempting to understand the AI moment? It tells them several things that the progressivist narrative suppresses.

First: the specific form AI takes is a product of specific historical contingencies, not an inevitable expression of technological destiny. The transformer architecture that underlies modern large language models — introduced in the 2017 paper "Attention Is All You Need" by a team at Google — is a specific technical solution to a specific set of problems, developed by specific researchers within a specific institutional context. Had the paper not been written, or had it been written differently, or had the research community's attention been directed elsewhere by the contingencies of conference deadlines and funding cycles, a different architecture might have emerged. The transformer is not the only possible foundation for language-capable AI. It is the foundation that happened to emerge from this particular sequence of historical events.

Second: the features that make current AI systems distinctive — including their most celebrated capabilities and their most troubling limitations — are products of these contingencies, not inherent features of artificial intelligence as such. Hallucinations, the tendency of language models to generate confident falsehoods, are not a necessary feature of any system capable of processing natural language. They are a specific consequence of the specific training methodology and the specific architectural decisions that produced this specific lineage of models. A different lineage, descended from different ancestors, might hallucinate differently, or not at all, or might exhibit entirely different failure modes that current researchers cannot anticipate because they have never seen them.

Third: the body plans that went extinct — the LISP machines, the symbolic reasoning systems, the expert systems of the 1980s — are not merely historical curiosities. They represent roads not taken, capabilities not developed, approaches not explored. Some of them may contain solutions to problems that the currently dominant approach cannot solve. The history of biology is full of cases where features that went extinct in one lineage were independently re-evolved in another, because the problems those features solved were real problems that persisted across ecological contexts. The same may be true of AI: approaches that were abandoned in the neural network winter may contain insights that the connectionist paradigm, for all its spectacular successes, cannot replicate from within its own architectural constraints.

The fossil record of technology, read with Gouldian attention to the specific, the contingent, and the extinct, reveals a world far richer and far less determined than the ladder narrative allows. The branches that were pruned were real branches, representing real alternatives to the forms that eventually dominated. Their absence from the popular narrative is not evidence that they were inferior. It is evidence that the narrative has been constructed to make the present look inevitable — which is precisely the construction that Gould spent his career dismantling, and precisely the construction that the AI moment most urgently needs to see through.

---

Chapter 3: Punctuated Equilibrium and the Winter of 2025

In 1972, Gould and Niles Eldredge published a paper that would reshape the understanding of how evolutionary change actually occurs. "Punctuated Equilibria: An Alternative to Phyletic Gradualism" challenged the dominant assumption — inherited from Darwin, reinforced by the Modern Synthesis, and deeply embedded in the culture of evolutionary biology — that species change gradually, accumulating small modifications over vast stretches of time until the accumulated changes produce a new form. The fossil record, Gould and Eldredge argued, showed something different. Species appear in the record fully formed, persist for millions of years with little measurable change, and then disappear — replaced by new forms that also appear fully formed, without the gradual transitional series that Darwinian gradualism predicts.

The traditional explanation for this pattern was that the fossil record was incomplete. The transitions did occur gradually; they simply were not preserved. The fossils were missing because the conditions for fossilization are rare, and the transitional populations were small and geographically restricted. Gould and Eldredge proposed a radically different interpretation: the fossils were not missing. The pattern in the record was the pattern of evolution. Species genuinely do remain stable for long periods — a phenomenon they called stasis — and genuine change, when it occurs, is concentrated in rapid bursts associated with speciation events in small, peripherally isolated populations.

The stasis was as revolutionary as the punctuation. Gradualism had no explanation for why species should remain unchanged for millions of years despite constant environmental fluctuation. Gould and Eldredge proposed that developmental constraints, genetic integration, and the stabilizing effects of large population sizes created a kind of organizational inertia that resisted change. Species were not static because nothing was happening. They were static because the internal architecture of the organism actively resisted modification. Change required a disruption sufficient to overcome this inertia — typically, the isolation of a small population in a novel environment, where the usual stabilizing forces were relaxed and the full range of latent variation could be expressed.

The theory was fiercely contested. Gradualists accused Gould and Eldredge of resurrecting saltationism — the idea that evolution proceeds by sudden leaps, which had been discredited decades earlier. Gould responded, with characteristic precision and characteristic impatience, that punctuated equilibrium was not saltationism. The transitions were rapid in geological time — thousands or tens of thousands of years — but they were not instantaneous. They involved ordinary Darwinian processes operating in unusual circumstances. The key insight was not that evolution could be fast. It was that the normal condition of species was stasis, and that stasis itself required explanation.

A March 2026 paper by Baciak and colleagues, published on arXiv under the title "Punctuated Equilibria in Artificial Intelligence," applied Gould and Eldredge's framework directly to the history of AI development. The authors identified five eras of AI since 1943, each characterized by long periods of relative stability — incremental improvements within a dominant paradigm — interrupted by rapid transitions that reorganized the competitive landscape. The symbolic AI era (1943–1987), the statistical learning era (1987–2006), the deep learning era (2006–2017), the pre-trained foundation model era (2017–2022), and the generative AI era (2022–present) each exhibit the characteristic Gouldian pattern: stasis, punctuation, new stasis.

Within the current generative AI era, the authors identified four sub-epochs, each triggered by a specific punctuation event: the release of ChatGPT (November 2022), the emergence of open-source alternatives (2023), the multimodal expansion (2024), and the agentic coding revolution (late 2025). Each transition was rapid — weeks or months, not years — and each reorganized the competitive landscape in ways that the participants within the previous epoch could not have predicted.

The parallel is not merely metaphorical. A separate April 2026 paper found that the statistical signatures of architectural diversification in AI — the patterns of branching, radiation, and extinction among model architectures — matched paleontological radiation patterns quantitatively. The heavy-tailed distributions, logistic diversification curves, and punctuated equilibrium dynamics visible in the Cambrian fossil record were reproduced, with eerie precision, in the record of AI architecture evolution. The authors concluded that the statistical structure of evolution is conserved across the carbon-to-silicon divide, not because the mechanisms are identical but because the topology of the fitness landscape — hierarchical, modular, with nested levels of constraint — produces the same statistical patterns regardless of whether the entities evolving are organisms or algorithms.

These findings transform punctuated equilibrium from a metaphor into a quantitative framework for understanding AI development. The implications are specific and consequential.

The stasis is not failure. The periods of relative stability that separate AI's punctuation events — the neural network winter, the years of incremental improvement in deep learning before the transformer breakthrough, the months of steady capability gains between GPT-3 and GPT-4 — are not periods of stagnation. They are periods during which the internal architecture of the dominant paradigm accumulates variation that the paradigm's own organizational constraints prevent from being expressed. The variation is there. The pressure is building. The constraints hold — until they don't.

Segal captures this dynamic precisely when he describes the adoption speed of AI tools as a measure not of product quality but of pent-up need. "The variation was already there, waiting. The pressure was already there, building. The transition looks sudden from the outside, but from the inside it is the release of something that was already coiled." This is punctuated equilibrium stated in the vocabulary of a technology builder rather than a paleontologist, but the underlying structure is identical. The decades of frustration that builders experienced — the gap between what they could imagine and what they could create, constrained by layer upon layer of implementation friction — was the equivalent of latent genetic variation held in check by developmental constraints. The arrival of tools that collapsed the imagination-to-artifact ratio released the variation. The punctuation was not the creation of something new. It was the expression of something that had been constrained.

But Gould's framework adds a dimension that Segal's account does not emphasize, and it is a dimension of the highest importance. The constraints that produced the stasis were not merely obstacles to be overcome. They were also the conditions that accumulated the variation whose release produced the breakthrough. Without the decades of friction, the variation would not have existed. Without the years of translating ideas through layers of implementation difficulty, the builders would not have developed the specific cognitive muscles — the taste, the judgment, the architectural intuition — that made them capable of directing the new tools when the tools arrived. The winter that preceded the spring was not wasted time. It was the developmental period during which the capacities that the spring required were being formed.

This is the deepest implication of punctuated equilibrium for the AI moment, and it is the one that both the triumphalists and the elegists systematically miss. The triumphalists celebrate the punctuation and dismiss the stasis as an obstacle that has been overcome. The elegists mourn the stasis and fear that the punctuation will destroy the conditions that produced the capacities they value. Gould's framework suggests that both are making the same error: treating stasis and punctuation as opposed, when in fact they are structurally interdependent. The punctuation is made possible by the stasis. The stasis accumulates what the punctuation releases.

The question this raises for the present moment is whether the current punctuation — the collapse of the imagination-to-artifact ratio, the removal of implementation friction, the democratization of building capability — is consuming the conditions that produced the variation it is now releasing. If the friction that built the senior engineer's architectural intuition has been removed, and if no new form of friction has replaced it, then the current generation of builders may be spending accumulated capital without replenishing it. The punctuation may be magnificent. The subsequent stasis — the period during which the next generation of variation must accumulate — may find itself impoverished.

Gould's theory does not predict this outcome. It identifies it as a possibility that the progressivist narrative structurally cannot see. The ladder sees only ascent. Punctuated equilibrium sees the ascent but also sees the conditions that make ascent possible, and it asks whether those conditions are being maintained or eroded by the very process they enabled.

The fossil record provides examples of both outcomes. Some punctuation events — the Cambrian explosion, the radiation of mammals after the K-T extinction — were followed by sustained diversification, as the new forms explored the ecological space that the punctuation had opened. Others — the end-Permian extinction, which eliminated ninety-six percent of marine species — were followed by prolonged recovery periods during which diversity remained depressed for millions of years, because the catastrophe had destroyed not only the organisms but the ecological infrastructure that supported diversification. The outcome depended not on the magnitude of the punctuation but on the condition of the substrate that the punctuation acted upon.

What is the condition of the human substrate upon which the AI punctuation is acting? Are the institutions, the educational systems, the cultural norms, the individual cognitive habits that accumulated during the long stasis of pre-AI computing robust enough to support sustained diversification in the post-punctuation landscape? Or are they being consumed by the very transition they enabled, the way a fire consumes the fuel that feeds it? These are empirical questions, and the answers are not yet known. What Gould's framework provides is the conceptual apparatus for asking them — and the historical evidence that the answers matter enormously.

---

Chapter 4: Spandrels, Hallucinations, and the Adaptationist Fallacy

The spandrels of San Marco are among the most celebrated architectural surfaces in Venice. The great dome of the basilica rests on rounded arches, and the triangular spaces between the arches — the pendentives, which architects call spandrels — are covered with some of the most magnificent mosaics in Christendom. An evangelist in each spandrel. Rivers of Paradise flowing between them. The design is so perfectly suited to the space that a visitor might conclude the spandrels were designed to hold mosaics — that the architect conceived the triangular spaces as decorative surfaces and then built the arches to support them.

The visitor would be wrong. The spandrels are not designed for anything. They are the necessary geometric consequence of mounting a circular dome on rounded arches. They could not not exist. Any structure that combines arches and a dome will produce triangular spaces between them, regardless of whether anyone intends to decorate those spaces or even notices them. The mosaics are a secondary co-optation — a use found for a structure that came into being for entirely different structural reasons.

In 1979, Gould and Richard Lewontin used the spandrels of San Marco as the centerpiece of one of the most influential papers in the history of evolutionary biology: "The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme." The paper argued that evolutionary biologists had fallen into a systematic error: the tendency to treat every feature of an organism as an adaptation — a structure shaped by natural selection to serve a specific function. Gould and Lewontin called this the "adaptationist programme," and they argued that it was not merely an intellectual habit but a paradigm that distorted the entire field's understanding of how organisms come to possess their features.

Some features are adaptations. The vertebrate eye is an adaptation for detecting light. The hemoglobin molecule is an adaptation for oxygen transport. These features were shaped by natural selection acting on variation in ancestral populations, producing structures exquisitely suited to their functions.

But many features are not adaptations. Some are spandrels — structural byproducts of other features, arising not because they serve a function but because the architecture that produces other, functional features necessarily produces them as well. The human chin, Gould argued, is a spandrel: it is not an adaptation for anything. It is the geometric consequence of the independent reduction of two growth fields in the human jaw. The chin exists because the jaw got smaller in two different directions simultaneously, and the intersection of those two reductions produced a protrusion that we call a chin. The chin does not serve a purpose. It is an architectural byproduct.

Other features are exaptations — structures that originally evolved for one function and were subsequently co-opted for another. Feathers evolved for thermal regulation; they were exapted for flight. The swim bladder in fish evolved for buoyancy control; it was exapted, in the lineage leading to terrestrial vertebrates, into the lung. In each case, the current function of the feature is not the function for which it was originally selected. The feature's history and its current utility are different stories, and confusing them — treating the current function as the explanation for the feature's origin — is the adaptationist error.

Gould himself drew the connection to computing, and he did so with a clarity that makes the application to modern AI almost inevitable. In his 1997 elaboration of the spandrels concept, published in the Proceedings of the National Academy of Sciences, Gould wrote: "Just consider the obvious analogy to much less powerful computers. I may buy my home computer only for word processing and keeping the family spread sheet, but the machine, by virtue of its requisite internal complexity, can also perform computational tasks exceeding by orders of magnitude the items of my original intentions." The computer's unintended capabilities — the tasks it can perform that no one bought it to perform — are spandrels: structural consequences of the complexity required to perform the intended tasks. They are not designed. They are not adaptations. They are necessary byproducts of a system complex enough to do what it was designed to do.

The application of this framework to modern AI is not a stretch. It is, in a meaningful sense, what Gould was already reaching toward. Large language models were designed — "trained" is the technical term, but "designed" captures the intentionality of the engineering process — to predict the next token in a sequence of text. That is the function they were selected for. The training process shaped their parameters to minimize prediction error on vast corpora of human-generated text. Every feature of these models that serves this function — their sensitivity to context, their capacity to maintain coherence over long passages, their ability to mimic the statistical patterns of many registers and genres — is, in Gouldian terms, an adaptation. It was shaped by the selection process (training) to serve the function for which the system was being optimized.

But large language models also exhibit capabilities that were not trained for, were not intended, and were not predicted by their designers. The capacity for what users experience as "reasoning" — the ability to break down complex problems into steps, evaluate intermediate results, and arrive at conclusions that require the coordination of multiple pieces of information — was not a training objective. No one optimized GPT-4 or Claude to reason. The models were optimized to predict text, and reasoning-like behavior emerged as a structural consequence of the complexity required to predict text well enough to satisfy the training objective. Reasoning, in these systems, is a spandrel.

The capacity for what users experience as "creativity" — the generation of novel combinations, unexpected metaphors, structural innovations in prose or code — is also a spandrel, compounded by a specific architectural feature: the stochastic sampling process that introduces controlled randomness into the model's output. The temperature parameter, which governs how far the model is willing to deviate from the most probable next token, is not a creativity dial. It is a variance dial. Higher temperature produces more variance, which, filtered through the statistical patterns the model has learned, sometimes produces outputs that humans recognize as creative. The creativity is in the recognition, not in the generation. The model is not trying to be creative. It is sampling from a probability distribution, and the sampling process — a structural feature of the architecture, not a designed capability — sometimes produces spandrels that humans find valuable.

And then there are the hallucinations. Gould and Lewontin's framework illuminates hallucinations — the tendency of language models to generate confidently stated falsehoods — more clearly than any other conceptual tool currently available. Hallucinations are treated, in the popular discourse and in much of the technical literature, as a bug — a failure of the system to perform its intended function. This framing implies that hallucinations can be fixed, that they are an imperfection in an otherwise well-functioning design, that with enough engineering effort the system can be made to hallucinate less while retaining all of its other capabilities.

Gould's framework suggests a radically different interpretation. Hallucinations are not bugs. They are spandrels. They are the necessary structural consequence of the same architectural features that produce fluency. A system trained to predict the most probable next token, given the preceding context, does not distinguish between tokens that are probable because they are true and tokens that are probable because they match the statistical patterns of confident assertion in the training data. Truth and fluency are different properties. The training process selects for fluency. Truth, when it occurs, is a frequent byproduct of fluency — because most of the training data is approximately true, confident assertions in the training data tend to be correlated with factually accurate statements, and the model learns this correlation along with everything else. But the correlation is imperfect. The model has learned the pattern of confident assertion, and it applies that pattern regardless of whether the content of the assertion happens to be factually accurate.

The hallucination is the spandrel of fluency. It is not a feature that serves a purpose, and it is not a failure of the system to achieve its purpose. It is the necessary geometric consequence of an architecture that produces fluent text by predicting the most statistically probable next token. Any system that generates text this way will hallucinate, because the statistical patterns of confident assertion and the statistical patterns of factual accuracy are correlated but not identical. Eliminating hallucinations entirely would require the system to distinguish between truth and probability — a capability that the architecture does not possess and was not designed to possess.

This reframing has practical consequences. If hallucinations are bugs, the engineering response is to fix them — to add layers of verification, to ground the model's outputs in external knowledge bases, to train on curated data that minimizes the gap between statistical probability and factual accuracy. These interventions are valuable. They reduce the frequency and severity of hallucinations. But they do not eliminate them, because they are treating a spandrel as a malfunction. The architecture that produces hallucinations is the same architecture that produces fluency, and you cannot remove the spandrel without altering the arch.

Segal's account of catching Claude in a philosophical error — a passage that "sounded like insight but broke under examination," a confident misuse of Deleuze's concept of smooth space — is a hallucination in action, and it illustrates the spandrel dynamic with uncomfortable precision. The passage worked rhetorically. It sounded right. It connected two ideas with the fluency and confidence that characterize the model's best outputs. The fluency was an adaptation — the model performing exactly as designed, predicting the statistically most probable next token in a philosophical discussion. The error was a spandrel — the same statistical machinery, applied to a domain where the correlation between fluency and accuracy was lower than the model's confidence suggested.

Segal's response — recognizing the error only because "something nagged," checking the reference, discovering the misuse — is the appropriate response to a spandrel. Not outrage that the system failed. Not despair that the system cannot be trusted. But the specific, patient, effortful work of checking the smooth surface for the seam where the architecture's structural limitations produce the architectural byproduct that looks like a mosaic but is actually a geometric inevitability.

The adaptationist fallacy, applied to AI, produces two equal and opposite errors. The first is the assumption that every impressive output of an AI system is evidence of designed intelligence — that the system "understands" the philosophical concepts it deploys, "reasons" through the problems it solves, "creates" the novel combinations it generates. This assumption treats every capability as an adaptation, ignoring the possibility that many capabilities are spandrels arising from architectural complexity rather than designed features of the system. The second error is the assumption that every failure of an AI system is evidence of fundamental inadequacy — that hallucinations prove the system is "just" a stochastic parrot, that errors reveal the emptiness behind the fluent surface. This assumption treats every limitation as a disqualifying flaw, ignoring the possibility that many limitations are spandrels arising from the same architecture that produces the capabilities.

Gould's framework dissolves both errors by insisting on a prior question: What was this feature shaped to do? The language model was shaped to predict text. Everything it does well and everything it does poorly must be understood in relation to that shaping function. The capabilities that exceed the shaping function are spandrels — valuable, sometimes astonishing, but not evidence of designed intelligence. The limitations that follow from the shaping function are also spandrels — real, sometimes dangerous, but not evidence of fundamental inadequacy. The system is what it is: a text-prediction architecture of extraordinary complexity, whose complexity produces byproducts that neither its designers nor its users fully understand.

The mosaics of San Marco are magnificent. They are also unintended. Both things are true, and holding both simultaneously is the intellectual discipline that Gould's framework demands — and that the AI moment, caught between uncritical celebration and uncritical fear, most urgently needs.

Chapter 5: The Mismeasure of the Machine

In 1846, Samuel George Morton, a Philadelphia physician regarded as the most accomplished physical anthropologist in America, published the culminating work of a decades-long project: the measurement of human skulls. Morton had amassed a collection of over six hundred crania from populations around the world, and he had filled each skull with white mustard seed — later with lead shot, for greater precision — to measure its internal volume. His results, tabulated with meticulous care, showed that the races of mankind could be ranked by cranial capacity: Caucasians at the top, with an average of 87 cubic inches; Mongolians in the middle, at 83; and Ethiopians at the bottom, at 78. The measurements were precise. The methodology was transparent. The data were publicly available. The conclusion was presented not as opinion but as the deliverancedeliverable of objective science: the races of mankind differed in intelligence, and the difference could be measured by the size of the brain that housed it.

Gould re-examined Morton's data more than a century later, and what he found became the centerpiece of The Mismeasure of Man. Morton's measurements were not fabricated. They were not dishonest in the crude sense. They were shaped, at every stage, by assumptions so deeply held that Morton himself could not see them operating. The samples were inconsistent — Morton included more small-skulled females in his "Ethiopian" category and more large-skulled males in his "Caucasian" category, producing a difference that reflected sample composition rather than population averages. The methods shifted between groups — seed packing is less consistent than lead shot, and Morton used seed for some groups and shot for others, introducing systematic variation that correlated with the ranking he expected to find. When Gould recalculated the data with consistent methods and consistent samples, the differences between racial groups shrank to statistical insignificance.

Morton was not a fraud. He was something more instructive than a fraud. He was a careful scientist whose care was insufficient to overcome the biases embedded in his question. He set out to measure the thing he already believed existed — a hierarchy of racial intelligence — and the entire apparatus of measurement, from sample selection to methodology to statistical analysis, was unconsciously shaped by the belief it was supposed to test. The instrument measured the measurer.

Gould generalized this lesson into one of his most penetrating critiques. The fallacy at the heart of intelligence measurement, he argued, is reification — the conversion of an abstract concept into a concrete entity. "Intelligence" is an abstraction. It is a word humans use to gesture toward a cluster of cognitive capabilities — memory, reasoning, pattern recognition, linguistic facility, spatial visualization, social perception, creative recombination — that do not reduce to a single dimension and cannot be captured by a single number. IQ tests do not measure intelligence. They measure performance on IQ tests. The score is then reified — treated as though it names a substance that exists inside the skull, a substance that can be weighed and compared across individuals and populations, a substance whose quantity determines a person's worth.

The reification is the error. Not the measurement. Measurements are valuable. IQ tests predict certain kinds of academic and professional performance with reasonable reliability. But the prediction is statistical, not ontological. The test predicts performance because performance and the test draw on overlapping cognitive skills, not because both are measuring the same underlying substance. The distinction matters enormously, because the reified version — intelligence as a single, measurable, genetically fixed quantity — has been used to justify forced sterilization, immigration restriction, educational segregation, and the general conviction that social hierarchies reflect natural ones.

The AI discourse has reproduced the reification fallacy with a speed and a lack of self-awareness that Gould would have found simultaneously appalling and grimly predictable.

The reproduction begins with benchmark scores. Every major AI model is introduced to the world through a battery of benchmarks — MMLU, HumanEval, GSM8K, ARC, HellaSwag — each purporting to measure a dimension of cognitive capability. The scores are tabulated, compared across models, and used to construct rankings. GPT-4 scores higher than GPT-3.5 on MMLU. Claude 3.5 Sonnet scores higher than Claude 3 Opus on HumanEval. The rankings are reported as though they measure something real — something called "intelligence" or "capability" that the model possesses in a determinate quantity, the way a skull possesses a determinate volume.

The structure is Morton's, translated from lead shot to floating-point numbers. A complex, multidimensional phenomenon — the set of things a language model can and cannot do — is compressed into a single ranking along a single axis. The ranking is then reified: the model that scores higher is declared "more intelligent," as though intelligence were a fluid that fills the model's architecture the way lead shot fills a skull, and as though the benchmark measured the volume of that fluid rather than the model's performance on a specific set of tasks under specific conditions.

The problems with this reification are identical to the problems Gould identified in the history of IQ testing. The benchmarks are not neutral instruments. They are designed by specific humans with specific assumptions about what "intelligence" or "capability" means, and those assumptions shape what the benchmarks measure. MMLU tests knowledge of facts and the ability to select the correct answer from multiple choices — a format that favors the statistical patterns of the training data over genuine understanding. HumanEval tests the ability to produce syntactically correct code that passes specific test cases — a format that favors pattern-matching on common programming problems over the capacity to design novel architectures. Each benchmark measures what it measures. The extrapolation from "performs well on this specific test" to "possesses intelligence" involves exactly the conceptual leap that Gould spent his career exposing.

The parameter count introduces a second layer of reification. The popular discourse tracks parameter counts — GPT-3 had 175 billion, GPT-4's count was never officially confirmed but is widely estimated in the trillions — as though the number of parameters were a measure of cognitive capacity, the way cranial volume was treated as a measure of brain power. Larger models are assumed to be more capable. The assumption is not entirely wrong — within a given architecture, scaling parameters does tend to improve performance on benchmarks, up to a point — but it is wrong in exactly the way that the correlation between cranial volume and certain cognitive tasks was "not entirely wrong" in Morton's data. The correlation exists. The reification — the conversion of the correlation into an ontological claim about what parameters are — is the error.

Parameters are not intelligence. They are the model's degrees of freedom, the adjustable weights that the training process modifies to minimize prediction error. More parameters mean more degrees of freedom, which means the model can represent more complex statistical relationships in the training data. But the relationship between representational capacity and the thing the discourse calls "intelligence" is neither linear, nor simple, nor well understood. Smaller models, trained on better data with better architectures, can outperform larger models on specific tasks. The correlation between size and capability is real but rough, exactly analogous to the correlation between cranial volume and cognitive performance in humans — a correlation that exists, that is measurable, and that tells you almost nothing about any individual case.

The reification becomes most dangerous when it is used to construct the technological equivalent of Morton's racial hierarchy: the ranking of AI systems on a single axis of "intelligence" that determines their worth, their trustworthiness, and the resources allocated to their development. The ranking encourages a monoculture of optimization — every lab racing to produce the model that tops the leaderboard, using the same benchmarks, the same training methodologies, the same architectural family. The diversity of approaches narrows. The bush is pruned to a single trunk. And the trunk is optimized for performance on instruments that measure a specific, narrow, culturally determined subset of the capabilities the discourse has reified as "intelligence."

Gould observed that the history of intelligence measurement is a history of circular reasoning: the test is designed to measure intelligence, intelligence is defined as whatever the test measures, and the circularity is concealed by the authority of the numerical result. The AI benchmark ecosystem reproduces this circularity with mechanical precision. The benchmark is designed to measure capability. Capability is defined as whatever the benchmark measures. The model that scores highest is declared most capable, and the declaration is used to justify the investment of resources in producing the next model that will score higher on the same benchmark, and the cycle continues, each iteration reinforcing the assumption that the benchmark measures something real rather than something constructed.

Segal's distinction between intelligence as capability and consciousness as experience represents an effort to break this circularity, and it is an effort Gould would have endorsed, though with qualifications. The distinction is real and important: a system can be extraordinarily capable — can produce code, prose, analysis, and synthesis that would take a human expert hours or days — without possessing anything that resembles conscious experience. But the distinction can itself become a new form of reification if it is not handled carefully. "Capability" is no less an abstraction than "intelligence." Treating it as a single, measurable substance that models possess in varying quantities reproduces the fallacy at a different level of description.

What Gould would insist on — and what the AI discourse most urgently lacks — is a pluralistic account of what AI systems do, one that resists the compression of multidimensional performance into a single ranking. A language model is not "more intelligent" or "less intelligent" than another. It is differently capable, differently limited, differently suited to different tasks, in ways that cannot be captured by any single number and should not be. The model that excels at code generation may fail at nuanced ethical reasoning. The model that produces the most fluent prose may hallucinate most confidently. The model that scores highest on MMLU may be least capable of the kind of integrative, cross-domain thinking that Segal identifies as the most valuable human contribution in the age of AI.

The single number conceals all of this. It conceals it the way Morton's cranial averages concealed the variation within his samples, the inconsistencies in his methods, and the assumptions embedded in his question. The number is precise. The precision is mistaken for accuracy. And the accuracy is mistaken for truth.

Gould's prescription was not to abandon measurement but to understand what measurement does and does not reveal. Morton's skull measurements were real measurements of real skulls. They were not measurements of intelligence. IQ scores are real scores on real tests. They are not measurements of a substance called intelligence. Benchmark scores are real scores on real benchmarks. They are not measurements of a substance called capability or intelligence or whatever word the discourse reaches for when it wants to rank AI systems on a single axis.

The resistance to reification is not anti-scientific. It is the most rigorously scientific position available, because it insists on distinguishing between data and interpretation, between what the instrument measures and what the measurer claims the instrument measures. The history Gould documented is a history of the gap between those two things — a gap that the authority of numbers makes invisible, and that becomes visible only when someone takes the trouble to examine the assumptions embedded in the instrument itself.

The AI moment is producing measurements at an unprecedented rate. Models are scored, ranked, compared, and judged by numbers that are treated as transparent windows onto the thing being measured. Gould's life work is a sustained demonstration that no measurement is a transparent window. Every measurement is a lens, and every lens has a shape, and the shape of the lens determines what the viewer sees. The viewer who does not examine the lens mistakes the shape of the lens for the shape of the world.

The skulls in Morton's collection are still in Philadelphia, housed at the Penn Museum. They are real objects with real volumes. The hierarchy they were supposed to demonstrate does not exist. The distance between the measurement and the conclusion it was supposed to support is the distance Gould spent his career mapping — and it is the same distance that separates a benchmark score from the claim that one AI system is "smarter" than another.

---

Chapter 6: Replaying the Tape of Technology

Gould's most famous thought experiment required no equipment, no laboratory, no funding. It required only the willingness to take contingency seriously. "Replay the tape of life," he proposed in Wonderful Life. Wind it back to the Burgess Shale, 530 million years ago, when the Cambrian explosion was populating the oceans with the most extravagant diversity of body plans the planet had ever seen. Let the tape run forward again, with the same initial conditions but a different sequence of contingent events — a different asteroid here, a different climate fluctuation there, a different predator happening to survive in one corner of one ocean. Would the result be the same?

Gould's answer was unequivocal, and it remains one of the most provocative claims in the history of evolutionary biology: replay the tape, and you get a different world. Not a slightly different world. A fundamentally different world. Humans would almost certainly not evolve. Mammals might not evolve. Vertebrates might not evolve. The specific lineages that produced the specific body plans that dominate the modern Earth are the products of specific contingent events — the survival of one Cambrian lineage rather than another, the K-T asteroid impact that eliminated the non-avian dinosaurs and cleared ecological space for mammalian radiation, the specific climatic conditions of the Pliocene that favored bipedal locomotion in one lineage of African apes. Remove any one of these contingencies, and the downstream consequences cascade through the entire subsequent history of life.

The thought experiment is deceptively simple. Its implications are radical. If the specific outcome of evolutionary history is contingent — if the tape replayed would produce a genuinely different world — then the specific organisms that exist now, including the one reading this sentence, are not the necessary products of evolutionary law. They are the fortunate survivors of a particular sequence of accidents. Their existence is not guaranteed by the process that produced them. It is permitted by it, which is a very different thing.

The thought experiment translates to technology with a directness that should unsettle anyone who speaks confidently about the inevitability of AI's current trajectory.

Begin with the transformer. The architecture that underlies every major large language model in use today — GPT-4, Claude, Gemini, Llama, Mistral — traces its lineage to a single paper: "Attention Is All You Need," published by Vaswani and colleagues at Google Brain in June 2017. The paper introduced the transformer architecture, which replaced the recurrent neural networks that had dominated natural language processing with a mechanism called self-attention that allowed the model to process all parts of an input sequence simultaneously, rather than sequentially. The result was dramatically more efficient training on large datasets and, ultimately, the capability explosion that Segal describes as the orange pill moment.

The paper is now treated as a historical inevitability — the breakthrough that had to happen, the discovery that was "in the air." But the specific path to the paper was paved with contingencies. The key researchers were at Google Brain, a specific institution with specific resources and specific research priorities. The self-attention mechanism drew on earlier work by Bahdanau and colleagues (2014), itself a specific response to specific limitations of encoder-decoder architectures that were the dominant paradigm at a specific moment in the field's history. The decision to dispense with recurrence entirely — the boldest move in the paper, the one that gave it its title — was not a foregone conclusion even within the team. Alternative architectures that combined attention with recurrence were plausible and were being explored by other groups.

Replay the tape. Imagine the Vaswani team disbands in 2016 — a not-implausible contingency in the volatile world of corporate research. Imagine the attention mechanism is developed but combined with recurrence rather than replacing it, producing a hybrid architecture with different capabilities and different scaling properties. Imagine that the specific dataset conditions of 2017 — the availability of internet-scale text corpora, the specific cost curves of GPU computation, the specific funding environment that made large-scale experiments feasible — were different by even a modest margin. In any of these scenarios, the specific architecture that now dominates AI would not exist. Something else would. Something with different capabilities, different limitations, different failure modes, different social consequences.

The transformer is the Pikaia of the AI Cambrian explosion — the specific organism that happened to survive the specific selection pressures of a specific moment, and whose survival determined the entire downstream trajectory. Gould's analysis of Pikaia, the modest Burgess Shale chordate that may represent the ancestral lineage of all vertebrates, emphasized that its survival was not guaranteed by its design. It was one of dozens of viable body plans in the Cambrian seas, and its survival through the subsequent extinction events that pruned the Cambrian fauna was contingent on factors that had nothing to do with the intrinsic superiority of the chordate body plan. Had Pikaia gone extinct — a perfectly plausible outcome, given the body count of the Cambrian — the vertebrate lineage would not exist, and neither would anything that descended from it: no fish, no amphibians, no reptiles, no mammals, no primates, no humans. The entire history of complex terrestrial life pivots on the contingent survival of one unimpressive organism in one ancient ocean.

The transformer is similarly unimpressive as an inevitability. It is impressive as an achievement — the insight that attention alone, without recurrence, could process sequential data effectively was genuine and consequential. But the achievement was contingent on its institutional context, its intellectual predecessors, and the specific combination of computational resources and research priorities that characterized Google Brain in 2017. The AI ecosystem of 2025 — the one in which Segal's engineers in Trivandrum experienced a twenty-fold productivity increase, the one in which the imagination-to-artifact ratio collapsed to the width of a conversation — is downstream of a specific contingent event that might not have happened, or might have happened differently, or might have produced a fundamentally different architecture with fundamentally different properties.

Consider the deeper counterfactual. If the neural network winter had not occurred — if the funding for connectionist research had continued through the 1970s and 1980s at even a modest level — where would AI be today? The mathematics of neural networks was substantially developed by the late 1960s. Backpropagation was independently discovered multiple times, by Seppo Linnainmaa in 1970, by Paul Werbos in 1974, by David Rumelhart and colleagues in 1986. The algorithm was ready. The hardware was not. But the hardware trajectory was itself contingent — the development of GPUs by Nvidia, originally for the video game market, provided the computational substrate that made large-scale neural network training feasible. Had the video game industry developed differently — a contingency that depends on the specific cultural and economic conditions of late-twentieth-century consumer entertainment — the hardware that enabled the deep learning revolution might not have existed in the form that made the revolution possible.

The point is not that AI would not have emerged. Some form of machine intelligence would likely have emerged from any replay of the tape that included the mathematical foundations of computing and sufficient economic incentive to develop computational hardware. The point is that the specific form — large language models based on transformer architectures, trained on internet-scale text data, deployed as conversational agents accessible through natural language interfaces — is contingent on a sequence of specific, unrepeatable, not-predetermined events. The specific capabilities of these systems (fluent text generation, code synthesis, cross-domain knowledge application) and the specific limitations (hallucinations, training data biases, inability to reason causally) are features of this lineage, not inherent features of artificial intelligence as such.

A different lineage — descended from symbolic reasoning, or from hybrid architectures, or from neural approaches that diverged from the transformer at some earlier branch point — would possess different capabilities and different limitations. The problems it could solve and the problems it could not would be different. The social consequences of its deployment would be different. The specific form of the "orange pill" moment — if such a moment occurred at all — would feel different, would affect different people differently, would produce different anxieties and different exhilarations.

This is not speculation. It is the application of a principle that Gould demonstrated, with exhaustive evidence, across the entire history of life: the specific outcome of any branching, contingent process depends on the specific sequence of contingent events that shaped it, and alternative sequences would have produced alternative outcomes. The principle does not deny that patterns exist. Rivers do flow downhill. Evolution does produce complexity (as a statistical artifact, not a directed tendency, as Gould would insist). Technology does expand capability over time. But the specific channels, the specific body plans, the specific architectures that emerge from these general tendencies are not determined by the tendencies themselves. They are determined by the contingencies.

Segal's river of intelligence provides the momentum. Gould's contingency provides the landscape that shapes the channel. The river is real — something has been flowing for 13.8 billion years, accumulating complexity, branching into new forms of information processing. But the specific channel the river carves at this particular bend — the specific form AI takes in this specific decade, deployed by these specific institutions, governed by these specific regulations, experienced by these specific humans — is not written in the current. It is being carved, right now, by the rocks and the weather and the beavers.

The tape is still recording. The organisms alive at this moment — the transformer-based language models, the humans who use them, the institutions that deploy them, the cultures that absorb them — are not the inevitable products of technological law. They are the contingent inhabitants of a specific moment, shaped by a specific history, facing a future that is genuinely open. The specific future that emerges will depend on the specific choices made by the specific beings alive at this specific bend in the river. Gould's thought experiment does not predict what those choices will produce. It demonstrates, with the full authority of the fossil record behind it, that the choices matter — that the tape replayed from this moment would produce a different future, and that the future we actually get is the one we actually build.

---

Chapter 7: Exaptation and the Unintended Future

The feather is among evolution's most elegant and most misunderstood products. The popular imagination pictures feathers as instruments of flight — structures shaped by natural selection to provide the aerodynamic surface that allows birds to become airborne. The story is clean, progressive, and wrong in its chronology. Feathers did not evolve for flight. They evolved for thermal regulation. The earliest feathers, preserved in the fossils of non-avian theropod dinosaurs — creatures that were as far from flight as a crocodile is from a hang glider — were simple filamentous structures, more like fur than like the complex vaned feathers of modern birds. They kept their owners warm. They served this function for millions of years before any lineage of feathered theropods took to the air.

Flight was a secondary co-optation. A structure that evolved for warmth was repurposed — in one lineage, under specific ecological conditions, through a sequence of intermediate forms whose specific selective pressures remain debated — for an entirely different function. The feather did not change to become a flight surface. It was already there, already complex, already possessing structural properties (lightness, flexibility, the capacity to form interlocking surfaces) that happened, by structural accident, to be useful for a purpose its original evolutionary context could not have anticipated.

Gould and Elisabeth Vrba gave this phenomenon a name in 1982: exaptation. The term was deliberately chosen to complement "adaptation." An adaptation is a feature shaped by natural selection for its current function. An exaptation is a feature that arose for one function — or for no function at all, as a spandrel — and was subsequently co-opted for a different one. The distinction is not a quibble. It is a fundamental reorientation of how the history of functional structures should be understood.

The history of life is saturated with exaptations, and many of them are among evolution's most consequential innovations. The swim bladder in fish — a gas-filled sac that allows bony fish to control their buoyancy — was exapted, in the lineage leading to terrestrial vertebrates, into the lung. Bones in the reptilian jaw, which originally served to articulate the jaw joint, were exapted in the mammalian lineage into the tiny bones of the middle ear — the hammer, anvil, and stirrup that allow mammals to hear with a sensitivity and frequency range that reptiles cannot match. In each case, the most transformative function of the structure was not the function for which it originally evolved. The swim bladder did not evolve so that fish could someday breathe air. The jaw bones did not evolve so that mammals could someday hear a whisper. The co-optation was contingent, unpredicted, and — viewed from the perspective of the original function — wildly improbable.

Gould argued that exaptation might be more important than adaptation for understanding the most significant innovations in evolutionary history. The logic is straightforward: adaptation refines existing functions, making organisms better at what they already do. Exaptation creates new functions, opening ecological possibilities that did not previously exist. The feather-as-insulation is an adaptation; natural selection refined it for thermal performance over millions of years. The feather-as-flight-surface is an exaptation; its co-optation opened the entire ecological domain of powered aerial locomotion, a domain that had been inaccessible to vertebrates for the preceding three hundred million years. Adaptation makes organisms better. Exaptation makes them different. And it is the differences, not the improvements, that produce the most consequential changes in the history of life.

The application of this framework to AI tools — and specifically to the AI moment Segal describes — is so natural that one suspects Gould would have made it himself, had he lived to see the technology mature.

Large language models were developed to predict text. That is their designed function, their adaptation. Everything about their training process — the objective function, the data selection, the architectural decisions — was aimed at producing systems that could predict the next token in a sequence with maximal accuracy. The prediction is the function the system was shaped for.

But the uses to which these systems are being put — the uses that Segal describes, the uses that constitute the orange pill moment — are overwhelmingly exaptations. Creative collaboration, the experience of working with Claude to find connections between ideas that neither party saw independently, is an exaptation. No one designed Claude to be a creative partner. The capacity for what users experience as creative collaboration emerged as a structural consequence of the system's designed capability for fluent, context-sensitive text generation — a spandrel that users discovered could be co-opted for a purpose entirely different from the one the system was built to serve.

Architectural brainstorming is an exaptation. Debugging complex systems through conversational diagnosis is an exaptation. The capacity to serve as a sounding board for half-formed ideas — to hold an intention, reflect it back clarified, and suggest connections the human had not considered — is an exaptation. Each of these uses represents a feather-to-flight transition: a capability that arose as a byproduct of text prediction being repurposed for a function that text prediction was never designed to perform.

The significance of this observation extends far beyond taxonomy. If the most consequential uses of AI are exaptations rather than adaptations, then the most important consequences of AI cannot be predicted by examining the technology's designed purposes. The engineers who built the transformer architecture were not building a creative collaboration tool. They were building a text-prediction mechanism. The creative collaboration was the flight that emerged from the feather. It was not foreseen, because it was not the function the structure was designed to serve.

This has a specific and uncomfortable implication for the technology industry's approach to AI development. The industry's planning process — the roadmaps, the benchmarks, the funding priorities — is organized around intended capabilities. Models are evaluated on their performance at tasks the developers anticipated. Resources are allocated to improve performance on those tasks. The entire institutional apparatus of AI development is adapted to produce adaptations — better performance on the functions the system was designed to perform.

Exaptations fall outside this apparatus. They emerge in the wild, when users encounter the system's capabilities in contexts the developers did not anticipate. They emerge from the intersection of the tool's structural properties and the specific, unforeseeable needs of specific human beings operating in specific circumstances. The Trivandrum engineer who built a complete user-facing feature in two days — despite never having written a line of frontend code — was exapting a text-prediction system for a purpose its designers had not specifically intended. The backend engineer who discovered that Claude could hold architectural conversations at a level that accelerated her design thinking was exapting a token predictor into a cognitive prosthesis. The collaboration through which Segal produced this book, the partnership between a human intelligence and a statistical inference engine that produced insights neither could have produced alone, was an exaptation of the most literal kind: a capability evolved for text prediction, co-opted for intellectual partnership.

Gould would have noted the irony. The technology industry is optimizing for adaptations — better benchmark scores, more accurate predictions, fewer hallucinations — while the most transformative consequences of the technology are exaptations that the optimization process cannot target because it cannot anticipate them. The most important thing AI does may be the thing it was not designed to do. The flight, not the warmth.

This pattern is visible in the history of technology more broadly, and the examples are instructive. The internet was designed as a military communication network, built to survive nuclear attack by routing around damaged nodes. Its most consequential use — the World Wide Web, electronic commerce, social media, the entire infrastructure of digital culture — is an exaptation. The GPS satellite system was designed for military navigation. Its exaptation into consumer mapping, ride-sharing logistics, and location-based services has produced economic and social consequences that dwarf its original military application.

In every case, the designed function is the adaptation, and the unintended use is the exaptation, and the exaptation is more consequential than the adaptation. The designed function is the swim bladder. The unintended use is the lung. Gould's framework predicts this pattern: the most significant innovations in any complex, branching system arise not from the optimization of existing functions but from the co-optation of existing structures for new ones.

The implications for forecasting the future of AI are humbling. If exaptation theory is correct — and the evidence from biology, from the history of technology, and from the first two years of widespread LLM deployment supports it strongly — then the most consequential uses of AI in 2035 are probably not uses that anyone in 2025 has imagined. They will emerge from the intersection of the technology's structural properties and human needs that have not yet been articulated, in contexts that have not yet been created, through co-optations that cannot be anticipated because they depend on contingent encounters between specific capabilities and specific problems.

The roadmap is a map of adaptations. The territory is shaped by exaptations. The map is useful but radically incomplete, and the territory will look nothing like the map — not because the map is wrong about what it includes, but because it cannot include what it cannot foresee.

Segal writes of Claude making a connection he had not seen — linking adoption curves to punctuated equilibrium, a connection that changed the direction of his argument. "Neither of us owns that insight," he writes. "The collaboration does." This is exaptation in real time: a text-prediction system, co-opted for intellectual collaboration, producing an output that neither the system's designers nor its user anticipated. The output is a feather that has just discovered it can fly.

The question is what other feathers are out there, attached to organisms that do not yet know they can become airborne. Gould's framework does not answer this question. It does something more valuable: it establishes that the question is the right one to ask, and that the answer will not be found on any roadmap.

---

Chapter 8: The Expanding Bush of Capability

In 1941, Ted Williams batted .406 for the Boston Red Sox. No player in Major League Baseball has batted .400 since. The conventional explanation is that Williams was a uniquely gifted hitter, a once-in-a-generation talent whose achievement reflects the outer boundary of human batting capability. Gould, characteristically, saw something different in the number. He saw not a story about Ted Williams but a story about the distribution of batting averages across the entire history of professional baseball — and the story the distribution told was precisely the opposite of the story the conventional explanation assumed.

The analysis became one of Gould's most celebrated arguments, the centerpiece of Full House: The Spread of Excellence from Plato to Darwin. Gould examined the distribution of batting averages over a century of professional baseball and found a striking pattern: the mean batting average had remained essentially constant, fluctuating around .260 for the entire period. But the variance — the spread between the best and worst averages — had steadily declined. The highest averages had fallen. The lowest averages had also risen. The entire distribution had compressed toward the mean.

The disappearance of .400 hitting was not evidence that modern players are less skilled than their predecessors. It was evidence that they are more skilled — all of them, including the worst. As the population of professional ballplayers improved in training, nutrition, conditioning, and tactical sophistication, the range of performance narrowed. The worst players got better faster than the best players got better, because there was more room for improvement at the bottom of the distribution than at the top. The right tail of the distribution — the .400 hitters, the outliers — was trimmed not by a decline in their absolute performance but by the rising tide of competence that lifted everyone.

The .400 average disappeared because it lived in the right tail, and the right tail shrank as the distribution compressed. Williams was not better than today's best hitters in absolute terms. He was further from his era's average than today's best hitters are from theirs, because the average was lower and the variance was wider. His achievement was a feature of the distribution, not merely a feature of the individual.

Gould used the baseball analysis to make a much larger argument: the myth of progress is, in most cases, a misreading of distributional change. When observers focus on the right tail of a distribution — the most complex organisms, the highest batting averages, the most capable AI systems — and ignore the rest of the distribution, they perceive directional progress where the actual phenomenon is the expansion (or compression) of variance. The mean may not have moved at all. The right tail extended, or the distribution shifted, but the full picture — the full house, in Gould's terminology — tells a different story from the one that selective attention to the extreme values constructs.

The biological application was the argument's deepest target. The history of life, Gould argued, does not show progress toward complexity. It shows the expansion of variance from a simple starting point. Life began simple because there was a left wall — a minimum level of organizational complexity below which life cannot exist. From that wall, variation expanded in all directions. The leftward expansion was blocked by the wall, so the distribution of complexity appeared to move rightward. But the rightward movement was a statistical artifact of the wall's constraint, not evidence of a directional tendency. The mean of the distribution — if such a thing could be calculated for the diversity of life — barely moved. Bacteria, the simplest and most ancient forms of life, remain the most abundant, the most ecologically dominant, and the most metabolically diverse organisms on the planet. The right tail of the distribution extended — producing, eventually, multicellular organisms, nervous systems, consciousness — but the tail is not the trend. The trend, if there is one, is expansion. Not direction.

The Full House framework applied to the AI moment produces an analysis that neither the triumphalists nor the elegists have yet articulated, and it begins with the same distributional logic that explained Ted Williams.

Consider the democratization of building capability that Segal describes. Before AI coding assistants, the distribution of who could build software was sharply constrained. On the left side, a hard wall: you needed years of specialized training, fluency in programming languages, access to development environments and deployment infrastructure. The wall excluded the vast majority of people with ideas from the population of people who could realize them. On the right side, the tail extended toward expert developers whose years of accumulated knowledge and practice allowed them to build systems of extraordinary sophistication.

AI tools lowered the left wall. The developer in Lagos, the backend engineer who starts building user interfaces, the designer who begins writing complete features, the non-technical founder who prototypes a product over a weekend — all of these represent the expansion of the distribution leftward, into territory that was previously walled off. The floor dropped. People who could not build before can build now.

What happened to the right tail? This is where Gould's distributional logic becomes precise and counterintuitive. The right tail — the expert developers, the architects of complex systems, the engineers whose decades of accumulated intuition allowed them to build things no one else could — did not extend proportionally. The same tools that lowered the left wall compressed the variance, exactly as improved training and nutrition compressed the variance in baseball batting averages. The gap between what a novice and an expert could produce narrowed — not because experts got worse, but because novices got dramatically better.

The expert developer using Claude Code is faster than the expert developer without it. But the novice developer using Claude Code is much faster relative to the novice without it, because the novice had further to travel. The tool's leverage is greatest where the gap between intention and capability was widest. The floor rises faster than the ceiling, and the distribution compresses.

In baseball, the compression of variance produced the disappearance of .400 hitting. In software development, the compression of variance may produce the disappearance of the .400 developer — the individual whose output so far exceeded the mean that they constituted a separate category. Not because such individuals cease to exist or cease to be extraordinary, but because the distribution around them has compressed to the point where their distance from the mean, the very thing that made them visibly exceptional, has shrunk.

This is not a loss narrative. Gould was emphatic on this point. The disappearance of .400 hitting is not evidence of decline. It is evidence of a system in which everyone has gotten better and the best are no longer as far from the average as they once were. The absolute level of play has risen. The relative dispersion has fallen. Both are true simultaneously, and confusing them — treating the compression of variance as evidence of declining quality — is exactly the error that the myth of progress encourages.

Applied to the AI moment, the Full House framework says: the expansion of who can build is real and significant. The floor has dropped. The distribution has widened leftward. People and populations that were previously excluded from the building process — by lack of training, by lack of capital, by geographical distance from the centers of institutional support — now have access to tools that lower the barrier between imagination and artifact. The developer in Dhaka, the student in Nairobi, the designer in Trivandrum who starts writing production features — each represents a new branch on the expanding bush of capability.

But the expansion is not progress in the directional sense. The mean quality of output may not increase. It may even decrease, as the flood of new production dilutes the average. What increases is the variance — the range of who produces, what they produce, and where they produce it from. The most interesting developments in the post-AI landscape will not come from the right tail, from the already-expert practitioners who use AI to extend their existing capabilities. They will come from the newly occupied regions of the distribution — from the branches of the bush that did not exist before the wall was lowered, from positions that no ladder model could have predicted because they are not climbing toward a predetermined summit but branching into unoccupied space.

Gould would have insisted on one further implication, because it follows from the distributional logic with the force of mathematical necessity. When the variance of a distribution expands — when a wall is lowered and new territory is opened — the most common outcome is not excellence. It is mediocrity. The median of the distribution does not shift toward the right tail when the left wall drops. It may shift leftward, as the influx of new, less experienced producers pulls the center of the distribution toward the bottom. The flood of AI-generated content — code, prose, images, music — that has followed the democratization of production is exactly what the Full House framework predicts: an expansion of variance in which the bulk of the new production occupies the middle and left of the distribution, with rare, unpredictable instances of genuine originality emerging from positions that no one anticipated.

The appropriate response to the flood is not to celebrate it uncritically (the triumphalist error) or to lament it as the degradation of quality (the elegist error). It is to understand it as a distributional phenomenon — the predictable consequence of lowering a wall — and to build the curatorial, critical, and institutional structures that allow the rare instances of genuine originality to be identified, supported, and amplified. The printing press produced a flood of cheap pamphlets alongside the works that changed civilization. The response was not to uninvent the press. It was to build libraries, develop critical traditions, and create the institutional infrastructure that separated the valuable from the abundant.

The bush expands. Most of the new branches will be unremarkable. Some will be extraordinary, and they will emerge from positions in the distribution that the ladder narrative — fixated on the right tail, blind to the rest of the distribution — cannot see and would not value if it could. Gould's framework insists on seeing the full house, not just the aces. The aces matter. But the full house is the phenomenon, and the phenomenon is richer, stranger, and more consequential than any individual card it contains.

Chapter 9: What Darwin Did Not See

The most consequential bird in the history of science nearly ended up in a dustbin. Darwin collected specimens throughout the Galapagos in September and October of 1835, but his attention during the voyage was directed primarily at geology — the raised beaches, the volcanic formations, the evidence of land slowly rising from the sea that would eventually contribute to his understanding of deep time. The birds were an afterthought. He shot them, skinned them, and stored them, but he did not label them by island. He mixed the specimens together. Several of the finches he identified incorrectly — labeling a finch as a wren, a finch as a "gross-beak," grouping species from different islands as though they were the same creature collected in the same place. The most famous birds in evolutionary biology were, at the moment of their collection, a mess.

The mess became a revelation only because of a specific, contingent encounter. Darwin gave his bird specimens to John Gould (no relation to Stephen Jay), an ornithologist at the Zoological Society of London, who examined them in January 1837 — more than a year after the Beagle returned to England. It was John Gould who recognized that the specimens from different islands were not varieties of a single species but distinct species, thirteen in all, each unknown to science, each exhibiting modifications of beak structure that correlated with different food sources on different islands. It was John Gould who showed Darwin what Darwin had been holding.

Gould — Stephen Jay Gould — used this story repeatedly, with visible relish, to demolish the myth of the solitary genius experiencing a flash of insight. The popular version of the Darwin story has him standing in the Galapagos, observing the finches with preternatural acuity, and intuiting natural selection on the spot. The reality is that Darwin did not know what he had collected. He did not see what the specimens contained. The question that would eventually transform biology — why are these birds similar but not identical? — did not form in the Galapagos. It formed in London, gradually, through conversation with a taxonomist who possessed expertise Darwin lacked, applied to specimens that Darwin had not bothered to organize properly.

The formation of the question was as contingent as the formation of the species the question concerned. Darwin might have discarded the specimens. He might have given them to a less perceptive ornithologist. John Gould might have been ill that January, or occupied with other collections, or less attentive to the subtle morphological variations that distinguished the island populations. The question might never have formed. The fact that it did is not evidence of inevitability. It is evidence of a fortunate intersection: a prepared mind, a collaborating expertise, and a body of evidence whose significance was invisible to the person who collected it.

Gould — Stephen Jay — generalized this observation into one of his most powerful arguments against retrospective logic, the cognitive habit of looking backward from a known outcome and concluding that the path to that outcome was the only possible path. Retrospective logic is the ladder's handmaiden. It takes a contingent sequence of events — Darwin collects birds, gives them to Gould, Gould identifies them, Darwin formulates a question, the question leads to a theory — and compresses it into a narrative of inevitability in which each step leads naturally to the next, as though the theory of natural selection were implicit in the first finch Darwin shot and merely needed to be unpacked.

The retrospective illusion is not innocent. It distorts understanding of how discovery actually works, and the distortion has specific consequences for how the AI moment is being interpreted.

The builders experiencing the orange pill — Segal's term for the recognition that something genuinely new has arrived — are in Darwin's position before the meeting with John Gould. They are holding specimens whose full significance they do not yet understand. They know something has changed. They can feel the weight of the specimens in their hands. But the question — the specific question that will organize their understanding of what the specimens mean — has not yet fully formed.

Segal captures this with a candor that Gould would have appreciated. Describing his attempt to articulate the idea that intelligence is not a possession but a participation, Segal writes: "I did not have an answer. I had the shape of one." The shape precedes the articulation. The intuition precedes the theory. The specimens are in hand, but the taxonomist has not yet examined them.

The retrospective narrative that will eventually be written about the AI moment — the narrative that business schools will teach and documentaries will tell — will compress this messy, contingent, uncertain process into a clean story of inevitable progress. The transformer architecture will be presented as the natural culmination of sixty years of AI research. The orange pill moment will be presented as the obvious consequence of reaching sufficient computational scale. The twenty-fold productivity multiplier will be presented as the predictable result of removing implementation friction.

Each of these presentations will be retrospectively coherent. Each will make the present look like the destination toward which the past was traveling. And each will be wrong in the specific way that the popular Darwin myth is wrong — not in its facts but in its causality. The facts are real. The transformer was developed. The productivity gains were achieved. The orange pill was experienced. But the path from cause to effect was not a highway. It was a game trail through dense undergrowth, legible only to those who walked it and invisible to those who arrive at the clearing and assume the highway was always there.

What the retrospective narrative conceals is the uncertainty that characterized the experience from the inside. Segal describes it repeatedly: the oscillation between excitement and terror, the inability to tell whether he was watching something being born or something being buried, the specific vertigo of standing on ground that has not yet decided to hold. This uncertainty is not a biographical detail. It is the epistemological signature of a genuine transition — the specific feeling that accompanies the moment when the old categories have broken and the new ones have not yet formed.

Darwin experienced this uncertainty for years. The period between John Gould's identification of the finches (January 1837) and the publication of On the Origin of Species (November 1859) — twenty-two years — was not a period of steady progress toward a known destination. It was a period of agonizing doubt, false starts, abandoned frameworks, and the slow, painful construction of an argument that Darwin himself was not sure would hold. He delayed publication for decades, partly out of social anxiety and partly because the argument kept developing, kept shifting, kept revealing new complications that required new solutions. The theory of natural selection, which retrospective logic presents as Darwin's inevitable contribution, was from the inside a fragile, uncertain, constantly revised construction that its author was never fully confident would survive contact with the scientific community.

The AI builders are in an analogous period. The specimens are identified — the capabilities are real, measurable, and accelerating. But the theory — the framework that organizes the capabilities into a coherent understanding of what they mean — is still forming. Segal's river metaphor is one attempt. The ascending friction thesis is another. The democratization argument is a third. Each captures part of the phenomenon. None captures all of it. The uncertainty is not a deficiency of the analysis. It is a feature of the moment — the specific cognitive condition of standing inside a transition whose full significance is not yet legible.

Gould would have found this uncertainty not merely tolerable but essential. His life's work was an argument against premature certainty — against the retrospective logic that makes the present look inevitable, against the ladder that makes the future look predetermined, against the reification that makes complex phenomena look simple. The uncertainty of the AI moment is, in Gouldian terms, the signature of a genuine branching event — a moment when multiple futures are genuinely possible and the specific future that emerges has not yet been determined.

The question of what AI means — not what it can do, which is measurable, but what it means for the creatures who use it — is Darwin's finch question, asked at a civilizational scale. The specimens are in hand. The taxonomist has not yet delivered the verdict. The question is forming, slowly, through the accumulation of evidence and the collision of perspectives and the specific, contingent encounters between specific minds and specific tools that cannot be replicated or predicted.

Gould would have insisted on preserving this uncertainty against the pressure to resolve it prematurely. The retrospective narrative is already being written. The ladder is already being drawn. The progression from calculator to computer to AI to AGI is already being presented as a natural ascent in which each step leads inevitably to the next. The narrative is satisfying. It is coherent. It is, in Gould's most precise sense, a just-so story — a retrospective construction that makes the contingent look necessary and the uncertain look resolved.

The honest position is Darwin's, circa 1838: holding the specimens, sensing their significance, and not yet knowing what theory they will eventually support. The honest position is uncomfortable. It lacks the narrative clarity of the ladder. But it has the compensating virtue of being true to the experience of living inside a transition whose outcome is genuinely undetermined — and of preserving the space in which the choices that will determine the outcome can still be made.

The finches nearly ended up in a dustbin. The theory nearly did not form. The question that reorganized biology depended on a chance meeting between a prepared mind and a capable taxonomist, and had either been absent, the history of science would have unfolded differently. Gould spent his career insisting on the reality of such contingencies — not to diminish Darwin's achievement, but to rescue it from the mythology that turns achievement into inevitability. The achievement was real. The inevitability was an illusion produced by looking backward from the known result.

The AI moment deserves the same rescue. The achievement is real. The inevitability is an illusion. And the distance between what is being held and what is being understood — the gap between the specimen and the theory — is where the most important work remains to be done.

---

Chapter 10: What the River Does Not Determine

The deepest tension in this book — the tension that has been present since the first chapter and that this final chapter must address without pretending to resolve — is the tension between momentum and contingency. Between the river and the landscape. Between the force that flows and the accidents that shape its channel.

Segal's Orange Pill proposes that intelligence is a force of nature, a river flowing for 13.8 billion years "from atoms to algorithms, from hydrogen to humanity to whatever comes next." The metaphor is powerful. It captures something real about the directionality of information processing in the universe — the tendency, visible across cosmic time, for matter to organize itself into structures of increasing informational complexity, from self-organizing chemistry through biological evolution through cultural accumulation to artificial computation. The river flows. The flow is real. And the flow has produced, at this particular bend, a set of technologies that have altered the relationship between human intention and its realization more rapidly and more fundamentally than any previous technological transition.

Gould's life work stands in productive tension with this metaphor — not as a contradiction but as a qualification so fundamental that it transforms the metaphor's meaning.

A river flows downhill. This is physics. The gravitational gradient is real, and the water has no choice but to follow it. To this extent, the river metaphor captures something genuine about the tendency toward increasing complexity in information-processing systems: given sufficient energy flow and sufficient time, complexity tends to increase, not because the universe prefers complexity but because the statistical dynamics of systems far from equilibrium tend to produce it as a byproduct of energy dissipation. Ilya Prigogine won a Nobel Prize for demonstrating this. Stuart Kauffman built a career exploring its implications. The river flows.

But the river does not determine where it flows. The specific channel — the particular path the water carves through the landscape — is determined not by the gravitational gradient but by the geology: the hardness of the rock, the composition of the soil, the presence of obstacles, the accumulated effects of prior erosion, the contingent history of the landscape through which the water passes. Two rivers flowing down the same gradient, through landscapes with different geological histories, will carve completely different channels. The water follows the same physics. The channels follow different contingencies.

Gould spent four decades establishing that this distinction — between the general tendency and the specific realization — is the most important distinction in the history of life, and the one most consistently overlooked by those who confuse the tendency with the destination.

Life tends toward greater variance. This is the Full House argument. The bush expands from a simple starting point, and because expansion is constrained on one side by a left wall of minimal complexity, the distribution appears to move rightward. The rightward movement is real as a statistical phenomenon. But it is not a direction. It is not a destination. The specific organisms that occupy the right tail of the distribution at any given moment are there not because the process aimed at them but because the process happened, through a long sequence of contingent events, to produce them. Different contingencies would have produced different right-tail occupants.

Evolution tends toward functional solutions. The eye evolved independently at least forty times across different lineages, which suggests that the problem of detecting light is so pervasive and the possible solutions so constrained that some form of light-detecting organ will emerge in almost any replay of the tape. This is convergence, and it is real. But the specific forms of the eye — the vertebrate camera eye, the arthropod compound eye, the mollusk mirror eye — are different in every lineage, shaped by the specific developmental constraints and the specific contingent histories of each lineage. The tendency toward light detection is general. The specific realization is contingent.

The same logic applies to intelligence. Some form of complex information processing would likely emerge in any replay of the tape that included sufficient energy flow and sufficient time. The tendency toward complexity is real. But the specific form that complex information processing takes — biological neurons organized in cortical columns, or silicon transistors organized in transformer architectures, or something else entirely that no replay has yet produced — is contingent on the specific sequence of events that shaped the specific lineage.

The AI moment sits at the intersection of the tendency and the contingency. The tendency — toward more capable information processing, toward the collapse of the barrier between intention and artifact, toward what Segal calls the narrowing of the imagination-to-artifact ratio — is real, and it may be, in some deep statistical sense, inevitable. Given sufficient computational resources, sufficient data, and sufficient economic incentive, some form of machine intelligence was likely to emerge from the general trajectory of computing. The river flows.

But the specific form that machine intelligence has taken — large language models based on transformer architectures, trained on internet-scale text data, deployed as conversational agents, producing the specific mix of capabilities and limitations that characterizes the current moment — is contingent. It is downstream of the specific decisions of specific researchers (the Vaswani team at Google Brain), specific institutional contexts (the specific research priorities and resource allocation of major technology companies), specific economic conditions (the specific cost curves of GPU computation, shaped in turn by the specific trajectory of the video game industry), and specific accidents of timing (the specific moment at which scaled computation, large datasets, and architectural innovation converged).

Different decisions, different institutions, different economic conditions, different timing — any of these alterations would have produced a different AI, with different capabilities, different limitations, different social consequences, and a different "orange pill" moment, if such a moment occurred at all.

This is what the river does not determine. The river determines that water will flow downhill. It does not determine whether the channel will be a gentle meander through fertile floodplain or a narrow gorge cutting through granite. The gravitational gradient provides the energy. The geology provides the shape. And the shape — the specific channel, the specific landscape that the water encounters — determines everything that matters about the river's practical consequences: whether it irrigates or floods, whether it supports life or erodes it, whether the creatures that live along its banks flourish or are swept away.

Segal's beaver metaphor captures the agency that Gould's contingency thesis demands. The beaver does not stop the river. The beaver does not pretend the river does not flow. The beaver studies the landscape, identifies the points of leverage, and builds structures that redirect the flow toward life. The dam is a contingent intervention in a system that is partly determined (the water flows downhill) and partly open (the specific channel the water carves depends on the specific features of the landscape, which can be altered by the specific actions of the specific organisms that inhabit it).

The beaver matters because the channel is not predetermined. If the river's course were fixed — if the gravitational gradient determined not just the general direction but the specific channel — then the beaver's dam would be a futile gesture, a temporary obstruction that the water would route around. The dam works because the landscape is contingent, because the specific path the water follows can be altered by relatively small interventions at the right points, and because the altered path produces a different ecology downstream. A pool forms behind the dam. Trout spawn in the still water. Moose wade in the shallows. Songbirds breed in the wetland margins. An ecosystem emerges that is richer, more diverse, and more resilient than the bare channel the water would carve without intervention.

The beaver's dam is a contingent structure in a contingent landscape, and its consequences are real. This is Gould's deepest lesson for the AI moment, and it is the lesson that the myth of inevitability most effectively suppresses: in a world where the outcome is contingent, intervention matters. Every choice matters. Every decision about how to build, how to deploy, how to regulate, how to educate, how to parent shapes the channel through which the river of intelligence flows — and different choices will produce different channels, with different ecologies, supporting different forms of life.

The myth of the ladder says the future is climbing toward a predetermined summit. Gould's bush says the future is branching in every direction, most branches ending in extinction, the specific branches that survive determined by contingent events that no one can predict and that everyone can influence.

The myth of inevitable progress says the choices do not matter because the destination is fixed. Gould's contingency thesis says the choices are the only things that matter, because the destination is not fixed — it is being constructed, right now, by the accumulated weight of every decision made by every person alive at this specific moment in the river's history.

The tape of life, replayed from the Cambrian, would produce a different world. The tape of technology, replayed from 1950, would produce a different AI. The tape of the present moment, played forward from here, will produce a future that is genuinely open — not because the river does not flow, but because the river does not know where it is going. The channel is being carved by the landscape, and the landscape is being shaped by the creatures that inhabit it, and the creatures that inhabit it are making choices, right now, that will determine whether the river irrigates or floods, whether the ecosystem downstream flourishes or erodes, whether the specific future that emerges from this specific moment is one that the specific humans alive to see it will recognize as worthy of the extraordinary tools they have been given.

Gould died in 2002, before the AI revolution he would have found so rich and so instructive. But his intellectual legacy — the insistence on contingency, the refusal of progressivist mythology, the demonstration that the specific outcome of any branching process depends on the specific sequence of events that shaped it — has never been more relevant than it is at this moment, when the most powerful technology in human history is being shaped by choices that the myth of inevitability tells us do not matter, and that Gould's life work demonstrates matter more than anything else.

The river flows. The bush branches. The tape records. And the future — genuinely, irreducibly, consequentially — depends on what happens next.

---

Epilogue

The branch I kept not seeing was my own.

I wrote The Orange Pill with a conviction that the river of intelligence flows in a direction — that 13.8 billion years of accumulating complexity pointed somewhere, that the trajectory from hydrogen to consciousness to Claude constituted a story with an arc. Gould spent his life demonstrating that this is the most natural mistake an organism can make: reading direction into a process that has none, seeing a ladder where there is only a bush.

The uncomfortable part is not that Gould might be right about the bush. It is what the bush means for the choices I describe in my book.

If the river has a destination, then the dams I urge people to build are steering mechanisms — useful but ultimately decorative, because the water arrives where it was always going. If the river does not have a destination — if the specific channel it carves depends on the specific contingent landscape it encounters — then the dams are not decorative. They are constitutive. They are the difference between a floodplain and a wasteland. Every choice about how to build, how to deploy, how to educate, how to parent becomes not a refinement of an inevitable outcome but a determinant of which of many possible outcomes actually materializes.

Gould's contingency thesis does not diminish the urgency I feel. It amplifies it. If the future is genuinely open, then the people alive at this moment carry a weight that a predetermined trajectory would have spared them: the weight of knowing that what they build, and what they choose not to build, and what they allow to be built without their input, will shape a future whose specific form is not yet written.

The finches nearly ended up in a dustbin. The question that reorganized biology depended on a chance encounter between a man who did not know what he had collected and a taxonomist who happened to look closely at the right moment. I think about that — about the razor-thin contingency between a revolution in understanding and a box of unlabeled bird skins gathering dust in a London attic — every time I sit down with Claude and feel the weight of specimens I cannot yet fully classify.

We are all holding finches we have not yet labeled properly. The taxonomy is forming. The question is still taking shape. And the only honest thing to say about where the river goes from here is that it depends — entirely, terrifyingly, magnificently — on what we decide to build at this particular bend.

— Edo Segal

THE BUSH IS THE TRUTH.
AND THE FUTURE OF AI
WAS NEVER INEVITABLE.**

The dominant story of artificial intelligence is a story of ascent -- vacuum tubes to transistors to neural networks to large language models, each step climbing inevitably toward the next. It is the most comforting story in technology, and Stephen Jay Gould spent his life proving that stories exactly like it are wrong. In this companion volume to The Orange Pill, Gould's frameworks -- punctuated equilibrium, contingency, the spandrels of unintended capability, the mismeasure of intelligence -- are applied to the AI revolution with startling precision. What emerges is not a rejection of the transformation but a radical reframing: the specific AI we have is not the only AI we could have had, and the future it produces depends entirely on choices that the myth of inevitability tells us don't matter.

If you believe the trajectory is fixed, you are a passenger. If you understand the trajectory is contingent, you are a builder -- and what you build next is the only thing that determines where the river goes.

-- Stephen Jay Gould, Wonderful Life

Stephen Jay Gould
“The Spandrels of San Marco and the Panglossian Paradigm: A Critique of the Adaptationist Programme.”
— Stephen Jay Gould
0%
11 chapters
WIKI COMPANION

Stephen Jay Gould — On AI

A reading-companion catalog of the 27 Orange Pill Wiki entries linked from this book — the people, ideas, works, and events that Stephen Jay Gould — On AI uses as stepping stones for thinking through the AI revolution.

Open the Wiki Companion →