CONCEPT

AI Safety Levels (ASL)

The tiered classification at the heart of the Responsible Scaling Policy — capability thresholds analogous to biosafety levels, specifying the safety measures required before deployment at each level.

AI Safety Levels (ASL) are the tiered classifications at the heart of Anthropic's Responsible Scaling Policy, defining capability thresholds that trigger specific safety requirements before deployment. Modeled on the biosafety levels used for working with pathogens of varying danger, the ASL framework specifies what safety measures must be in place before a system exhibiting a given level of capability can be deployed at scale. ASL-1 covers systems posing no meaningful catastrophic risk. ASL-2 covers systems with early signs of dangerous capabilities that do not yet provide meaningful uplift to bad actors. ASL-3 covers systems that substantially increase the risk of catastrophic misuse or show early signs of autonomous capabilities. ASL-4 and above address capabilities the current framework treats as requiring additional research before commitments can be made.

In the AI Story

Hedcut illustration for AI Safety Levels (ASL) — AI Safety Levels (ASL)

The biosafety analogy is deliberate and illuminating. In biological research, the containment level required for working with a pathogen is determined by the pathogen's characteristics — transmissibility, virulence, availability of treatments, consequences of accidental release. The containment is proportional to the risk, and the risk assessment precedes the work rather than following the consequences. Applied to AI, the same logic produces a framework in which safety measures are determined by capability rather than by incident.

Each ASL tier specifies requirements across several dimensions: security measures to prevent model theft, deployment restrictions to control how the system can be used, monitoring infrastructure to detect misuse, and red-teaming protocols to identify failure modes before deployment. The requirements are cumulative: ASL-3 includes all ASL-2 requirements plus additional measures specific to the elevated risks. The tier assignment for a given system is determined through capability evaluations conducted before deployment.

The framework is designed to be extended as new risks emerge and new capabilities appear. The specifications for ASL-4 and ASL-5 have been deliberately left as research commitments rather than finalized protocols, reflecting Amodei's recognition that the appropriate safeguards for future capability levels cannot be specified in advance with confidence. The commitment is to develop those safeguards before the capabilities arrive, not to claim they already exist.

The ASL framework has influenced other frontier labs' approaches to responsible scaling, with OpenAI and Google DeepMind adopting analogous tier-based frameworks. The convergence reflects both the structural logic of the approach and the competitive value of being seen as responsible. Whether the frameworks produce similar outcomes in practice depends on the specific evaluations, enforcement mechanisms, and organizational commitments backing them.

Origin

The ASL framework was introduced in Anthropic's Responsible Scaling Policy version 1.0 in September 2023 and has been iteratively refined in subsequent versions. The naming convention and tier structure were deliberately chosen to invoke the biosafety analogy, which provided both conceptual clarity and institutional legitimacy by association with an established risk management discipline.

The framework drew on earlier work in AI safety evaluation — including efforts at DeepMind, Google, and academic research groups — but Anthropic's commitment to binding the company to specific requirements at each tier represented a significant institutional innovation.

Key Ideas

Capability determines containment. Safety requirements scale with capability rather than responding to incidents, making the framework prospective rather than reactive.

Cumulative requirements. Higher tiers include all lower-tier requirements plus additional measures specific to elevated risks.

Multiple dimensions. Security, deployment restrictions, monitoring, and red-teaming each contribute to the overall safety posture at a given tier.

Extensible by design. ASL-4 and above are left as research commitments, acknowledging that future capabilities require safeguards not yet specifiable.

Convergence across labs. Other frontier organizations have adopted analogous frameworks, suggesting the approach captures something structural about responsible development.

Debates & Critiques

The central debate concerns whether capability evaluations can reliably distinguish between tiers — whether the measurements that trigger additional safeguards are precise enough to be meaningful. Critics argue that the thresholds are chosen to align with what labs are already doing rather than what safety would require. Defenders argue that the framework's value lies in establishing the institutional commitment to evaluate before deploying, and that the specific thresholds can be refined over time.

Appears in the Orange Pill Cycle

Dario Amodei — On AI