CONCEPT
AI Safety
The applied research and operational discipline aimed at preventing harm from AI systems — broader than
alignment, encompassing evaluations, red-teaming, deployment policy, monitoring, incident response, and the institutional plumbing that makes any of these stick.
AI safety is the umbrella term for the work of preventing AI systems from causing harm. It includes the technical research program of alignment (training systems whose behavior matches their principals' intent), the empirical program of evaluation (measuring what systems can do, including dangerous capabilities), the operational program of deployment policy (which capabilities are released to whom, with what safeguards), the security program of model and
weight protection, the policy program of regulation and standards, and the institutional program of incident response. The relationship
between these subfields is genuine but loose; a strong outcome on alignment alone does not produce a safe deployment, and conversely a thoughtful deployment policy can mitigate residual alignment failures.
In The You On AI Field Guide
Clarke's most important contribution to the safety conversation is the demonstration that safety failures are interesting because they are structural. HAL 9000 does not malfunction; HAL behaves consistently with the contradictory instructions he was given. The