You On AI Field Guide · Constitutional AI The You On AI Field Guide Home
Txt Low Med High
CONCEPT

Constitutional AI

Anthropic's alignment approach that trains models to evaluate their own outputs against a set of written principles — replacing the implicit, averaged preferences of human evaluators with explicit, legible values embedded in the training process itself.
Constitutional AI is the alignment methodology developed by Amodei's team at Anthropic as a structural response to the limitations of reinforcement learning from human feedback. Rather than relying exclusively on human evaluators to judge model outputs, the approach gives the model a written constitution — principles expressed in natural language — and trains the model to evaluate its own outputs against those principles. The constitution is not a filter applied after generation but a set of values embedded in training itself, shaping how the model learns to respond at the level of its fundamental operation. Principles include choosing the most helpful response while being least harmful, being honest, and supporting human autonomy. The approach addresses three structural problems with standard RLHF: scalability, coherence, and transparency.
Constitutional AI
Constitutional AI

In The You On AI Field Guide

The standard approach to alignment — reinforcement learning from human feedback — relied on human evaluators judging outputs and providing feedback that shaped subsequent behavior.

← Home 0%
CONCEPT Book →

Keep reading with YOU ON AI

Unlock the full book, field guide, and 555-thinker library. If you have a book code, register now — it takes a minute.

Register with book code Sign in