Perrow identified the paradox in nuclear engineering. Redundant cooling systems designed to ensure backup when primary systems failed increased the plant's interactive complexity — more components, more connections, more pathways for failure to propagate. The backup cooling introduced new failure modes: valve conflicts between systems, instrumentation confusion when both activated, operator uncertainty about which system controlled the process. The safety mechanism protected against its target failure and introduced new failures that did not exist before.
The AI safety community is reproducing the paradox at scale. The proliferation of safety layers in large language models — RLHF, constitutional AI constraints, red-team protocols, content filters, guardrails, alignment training — represents genuine and well-intentioned safety work. Each layer addresses a specific category of risk. Each is individually rational. And each adds complexity to a system whose interactive complexity is already beyond its designers' capacity to characterize. The layers interact: RLHF training interacts with constitutional constraints; content filters interact with alignment training; red-team findings produce fixes that create new failure modes for the next round to discover.
The same dynamic operates at the organizational level. An organization implementing mandatory breaks, independent code reviews, staged deployment protocols, and automated testing has created a system of safety mechanisms. Each mechanism interacts with every other and with the workflow they collectively modify. The mandatory break interacts with the review schedule; the staged deployment interacts with the testing suite; the testing suite interacts with the break timing. The system of dams is itself an interactively complex system, subject to the same dynamics that produce normal accidents in the primary system.
The prescription is not fewer safety mechanisms but better-designed ones. The LessWrong community's 2025 analysis proposed modularity and Just-In-Time Assembly — designing AI systems as loosely coupled modules assembled for specific tasks and disassembled afterward. Modularity limits failure-propagation pathways because unconnected modules cannot transmit failure between them. The modularity principle reduces the interactive complexity of the safety system by structural design rather than by procedural discipline.
Perrow identified the paradox in his analysis of nuclear engineering but did not name it as such. The explicit formulation for AI systems belongs to Bianchi, Cercas Curry, and Hovy in their 2023 JAIR paper.
Safety is not additive. More safety layers do not produce linearly more safety; they produce more complexity, whose emergent properties may reduce net safety.
Mechanisms as components. Safety systems are themselves systems, subject to the same dynamics of interactive complexity and tight coupling.
Boundary effects. Constraints create edges where novel failure modes cluster — outputs that satisfy constraints technically while violating their spirit.
Second-order monitoring. Effective safety requires monitoring not just the primary system but the safety mechanisms themselves for degradation and interaction effects.
Architectural over procedural. The prescription is modularity and structural design, not better procedures or more vigilance.