Emergence is the appearance of system-level properties that cannot be predicted from the properties of the system's components. It is not a gap in understanding papered over with a fancy word; it is a rigorously characterized phenomenon in complexity science. Emergence occurs in systems with many interacting components, when the interactions are nonlinear, and typically at thresholds — below a certain scale, the property is absent; above it, the property appears, often suddenly. GPT-3's unexpected capabilities — translation, arithmetic, code generation, analogical reasoning — emerged from a simple next-word prediction objective at sufficient scale. No one designed them. No one predicted them.
The structural consequence of emergence is that the capabilities of AI systems cannot be deduced in advance of building them. This is not temporary ignorance to be resolved with better theory. It is a feature of emergent systems: the property exists only at the level of interaction, and the only way to discover it is to build the system and observe what happens. The builder who uses AI is working with a tool whose boundaries of competence shift with each increase in scale, and the shifts cannot be predicted.
Edo Segal's account in The Orange Pill documents the experiential side of an emergence threshold. The December 2025 moment when a Google engineer received a working prototype of a year's work in an hour was, from the outside, a step function rather than a gradient. This is what it feels like to live through a phase transition in capability — the ground shifts, and the new landscape is not continuous with the old.
Agüera y Arcas's work with the Santa Fe Institute places him at the intellectual center of emergence research — a tradition running through Stuart Kauffman, Murray Gell-Mann, and the broader complexity science community. The connection is not incidental. The AI transition is, in formal terms, an emergence cascade: capability thresholds producing new capabilities at a pace that exceeds human institutional absorption.
The 2023 Agüera y Arcas and Peter Norvig essay arguing that artificial general intelligence is already here rests on this framework. The claim is not that AI has reached human-level performance across all tasks. It is that AI has crossed the threshold of generality — competent performance across an unpredictable range of tasks. Like ENIAC in 1945, current AI is the first general-purpose intelligence without being a good one by any future standard. The generality is here; the refinement is not.
The concept has roots in J.S. Mill's distinction between homopathic and heteropathic causation, was formalized in 20th-century complexity science by Philip Anderson's 1972 essay More is Different, and became central to AI after the 2020 GPT-3 paper documented specific capability thresholds.
Emergence is threshold-dependent. Capabilities appear not gradually but at specific scales, often suddenly.
Unpredictability is structural. You cannot deduce emergent properties from the components; you can only discover them by building and observing.
The knowledge is experiential. The most important information about what AI can do lives in the builders who use it daily, not in the researchers who designed it.
The thresholds keep arriving. Each scaling round produces new emergent capabilities, requiring continuous recalibration of what to delegate and what to retain.
Skeptics argue that many claimed emergent capabilities in LLMs are artifacts of evaluation metrics rather than genuine phase transitions (Schaeffer et al., 2023). Agüera y Arcas has acknowledged the measurement critique while maintaining that the broader pattern of capability expansion with scale is empirically robust.