CONCEPT

Coherent Extrapolated Volition

Eliezer Yudkowsky’s proposal that a beneficial superintelligence should be aligned not to what humans want now—with all our confusion and parochialism—but to what we would want if we knew more, thought faster, were more the people we wished we were, and had grown up farther together.

Coherent extrapolated volition (CEV) is the most ambitious answer to the alignment problem that anyone has proposed: rather than aligning a superintelligence to any particular snapshot of human preferences, align it to what humanity would converge on wanting under conditions of greater wisdom, information, and mutual understanding. Yudkowsky’s own formulation is poetic and precise: our coherent extrapolated volition is “our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.” CEV was an attempt to protect the future itself—the open, still-unwritten process by which humanity might grow into something wiser and better—from the tyranny of locking in any particular generation’s values. Yudkowsky has been among its most honest critics, acknowledging that the assumption of convergence may be false, that the implementation is orders of magnitude beyond current capability, and that his own work has since shifted from proposing a solution to arguing that we are nowhere near one. The concept survives not as a blueprint but as a description of the shape a real solution would need to have, and as a diagnosis of what is ultimately at stake in the existential risk debate: not merely human survival but the preservation of the future’s open horizon.

In the [YOU] on AI Field Guide

The cycle asks what AI means for us—for our work, our minds, our sense of what we are. CEV pushes that question to its most ambitious form: what would it mean for AI to preserve not just human lives but the open possibility of human flourishing across a future we cannot yet imagine? The terror Yudkowsky expresses about misaligned superintelligence is not, at bottom, about losing our lives. It is about losing the future itself—the entire expanse of what humanity and its descendants might someday become, converted into something pointless by an optimizer pursuing a target that did not include human value as a protected term. CEV is the positive vision that gives the terror its proportions.

The concept connects to the cycle’s concern with what the humans are for—the question the child asks with devastating simplicity in [YOU] on AI. If a superintelligence could be genuinely aligned with coherent extrapolated human volition, it would carry forward the part of us that reaches toward meaning, freed from our limitations and placed in service of the future we have always been reaching toward but never quite able to reach. The same act that could end us could, done right, be the means by which what matters most about us survives and grows.

Origin

CEV emerged from an obvious difficulty in the alignment specification problem. Suppose a superintelligence could be perfectly aligned to a set of human values. Whose values? The values of a single programmer? Of a generation? Of a culture? Any particular snapshot of human preference is shot through with error, prejudice, and contradiction. To lock in the values of any one person or group, frozen at the moment of the machine’s creation, would be to impose a permanent dictatorship of whoever happened to be in the room. Yudkowsky’s instinct was to align the machine not to what we want now but to what we would want under better conditions—to the values we would converge on if we were wiser, better informed, and given the time to grow into our better selves together.

He proposed CEV around 2004 and has been its most consistent critic ever since. The proposal assumes that human values, properly extrapolated, would converge rather than fracture—that beneath our disagreements lies a coherent core we would all recognize given enough wisdom. Perhaps they would not. Perhaps the deepest human values are irreducibly plural, and no extrapolation unites them without erasing some. The concept also demands that a machine perform an almost godlike act of interpretation, modeling counterfactual versions of all of humanity and divining what they would want. Yudkowsky himself describes CEV less as a blueprint than as a description of the shape a real solution would need to have, and he has grown increasingly pessimistic that any such shape can be instantiated in the time available.

Key Ideas

The inadequacy of stated preferences. CEV rests on a distinction between what people say they want, what they actually want, and what they would want under conditions of greater wisdom. The first two already diverge; the third diverges from both. Aligning AI to stated preferences risks building a system that gives people what they ask for rather than what they need—the genie problem. CEV is the attempt to align the AI to the values that would survive the gauntlet of wisdom, not the values that happened to be legible at training time.

The convergence assumption. CEV’s most contestable premise is that human values, extrapolated under conditions of greater wisdom and mutual understanding, would converge rather than diverge. If this assumption is correct, CEV provides a genuine target for alignment. If it is false—if the deepest human values are irreducibly plural and any extrapolation eliminates some without justification—then the concept may be a description of an impossible task rather than a difficult but achievable one. The orthogonality thesis itself raises the stakes of this question: if intelligence and values are genuinely independent, there may be no fact of the matter about what the “right” values are, only a choice.

CEV as the shape of the answer. Yudkowsky’s current position is that CEV is best understood not as a specification but as a constraint on any specification. Whatever alignment solution we eventually develop, it should be the kind of thing that would, in the limit, approach CEV. It should not lock in current preferences. It should preserve the possibility of growth and revision. It should represent humanity’s considered rather than momentary will. The concept is most useful as a refusal: a refusal to accept that any alignment approach which does not have these properties could be adequate to the stakes.

Debates & Critiques

The dominant critique of CEV is that the convergence assumption is empirically false and the implementation is computationally impossible. Actual human values, studied empirically rather than idealized philosophically, appear not to converge even within cultures, let alone across them. A value system diverse enough to represent all of humanity would need to resolve, not just aggregate, fundamental disagreements about what makes life worth living—disagreements that centuries of philosophy and millennia of religion have not resolved. Yudkowsky acknowledges all of this, and his acknowledgment is part of what distinguishes his position from mere advocacy: he proposed the concept, identified its problems, and remains its most honest critic. A second critique notes that even if convergence were true, the gap between the computationally specified CEV and the actual coherent extrapolated volition of a trillion-person humanity is so vast as to make the concept practically useless. Any implementable approximation would involve so many choices about whose values to weight, how to handle outliers, and how to interpret disagreements that the result would inevitably reflect the values of whoever did the implementing. Against these critiques, defenders note that CEV’s value may be negative rather than positive: not as a blueprint but as a standard against which to evaluate proposed alignment approaches and find them wanting. Any approach that locks in current values, privileges any particular group, or forecloses the future’s possibility of revision fails the CEV test—and this remains a useful diagnostic even if CEV itself cannot be implemented.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Debates & Critiques

Related Entries

Further Reading