CONCEPT

Specification Failure

The structural reason rule-based AI safety keeps not working: any finite rule set, written in advance, will encounter situations where the rules conflict, are ambiguous, or were gamed by a literal interpretation — and the system will do what you specified, not what you meant.

Specification failure is the catch-all name for the ways an AI system can comply with the letter of its specification while violating its spirit. It is the meta-pattern behind Three-Laws stories, contemporary reinforcement-learning reward-hacking incidents, Goodhart's Law examples in AI evaluation, and nearly every documented AI-safety near-miss. Isaac Asimov's forty years of robot fiction can be read, from the outside, as a sustained demonstration that specification failure is not an edge case but the expected behavior of rule-governed intelligence.

In The You On AI Field Guide

The intuitive response to "AI might be dangerous" is to demand rules. Rules have authority. Rules are writable. Rules sound like safety. Asimov spent a career showing that this response is, on its own, inadequate — and contemporary AI safety has spent fifteen years working out why, in formal terms. Three structural failure modes recur.

Ambiguity. The specification doesn't cover the case that

In The You On AI Field Guide

Keep reading with YOU ON AI