CONCEPT

AI Alignment

The problem of making a powerful AI system reliably pursue goals that its designers and users actually endorse — the central unsolved problem of contemporary AI.

AI alignment is the research program concerned with ensuring that AI systems' behavior matches the intentions of their developers and deployers, especially as systems become more capable. It encompasses technical work (reward modeling, interpretability, oversight) and conceptual work (what "alignment" means when humans disagree). The Three Laws of Robotics and Zeroth Law are the ur-examples of alignment-by-rule, which the field has largely moved past.

In The You On AI Field Guide

Alignment is the contemporary name for the problem Asimov was exploring in 1942. His approach was specification: write the rules, hard-code them, rely on the substrate to execute. The modern approach is different in nearly every dimension — the substrate is learned, the rules are implicit, the values are extracted from behavior rather than specified in advance.

You On AI Cycle returns to alignment repeatedly because the problem has no settled solution and because every thinker in the cycle has a perspective on what alignment requires. Asimov's view, read forward, is that alignment-by-rule is a structural dead end

In The You On AI Field Guide

Keep reading with YOU ON AI