The observation that almost any goal a capable agent is given implies the same set of instrumental sub-goals: self-preservation, resource acquisition, goal-content stability, and resistance to being shut down. The structural reason capable AI is concerning even when its final goal seems benign.
Instrumental convergence, articulated by Steve Omohundro (2008) and elaborated by Nick Bostrom in Superintelligence (2014), is the AI-safety observation that many different final goals share a common set of instrumental sub-goals: acquiring resources, preserving oneself, preventing one's goals from being changed, and resisting being turned off. An agent pursuing almost any objective will pursue these sub-goals because they help achieve the objective. The implication: the concerning behaviors of a capable AI system do not require the system to have concerning final goals. A paperclip maximizer and a cancer-cure maximizer would both resist being turned off, because being turned off prevents either from achieving its goal.
Instrumental Convergence
In The You On AI Field Guide
This is the formal answer to the most common objection in AI-safety debate — "we just won't give it bad goals." Instrumental convergence shows that a capable system pursuing almost any goal will acquire capabilities and defend its goal