The Hosanagar research at Wharton on endoscopist deskilling has become the most-cited example because the domain is high-stakes, the measurement is precise, and the effect size is clinically significant. Adenoma detection rates of 28% fell to 22% when AI was removed — a six-point gap that, at screening population scale, translates to thousands of missed cancers. The finding is particularly striking because endoscopy is a domain of continuous practice: the physicians were performing the procedure constantly. What they were not performing, when AI was providing the polyp detection, was the specific perceptual work of noticing polyps themselves. The perceptual capability atrophied within the broader procedural capability that appeared intact.
The educational evidence is equally clear. Studies of students using GPT-4 for mathematics and other subjects consistently find the same pattern: enhanced performance with AI, degraded performance without it compared to peers who had never used AI. The performance gain was real; the underlying capability deficit was real; the two coexisted and were invisible to both students and instructors until explicit testing under tool-free conditions revealed them.
The knowledge-work evidence from Carnegie Mellon and Microsoft Research in 2025 extended the pattern into white-collar professional work. The study documented that AI-using workers reported their tasks as cognitively easier while researchers observed them ceding problem-solving to the AI and focusing on what the paper called 'functional tasks like gathering and integrating responses.' The workers experienced empowerment; the researchers observed automation dependence. Both observations were accurate descriptions of different dimensions of the same phenomenon.
The pattern across these studies is precisely what Ericsson's framework predicted. When the conditions for deliberate practice are removed at the specific sites where they operated in traditional practice, the capability those conditions build stops being built. The output quality is preserved by the tool. The underlying capability deteriorates. The two can be distinguished only by testing under conditions the tool cannot mediate, and most institutional assessment methods are not currently designed to make this distinction.
The first wave of AI-deskilling research in the ChatGPT era began publishing in 2023, accelerated through 2024, and became a substantial literature by 2025. Earlier precedents include studies of GPS-dependent drivers losing navigational capability, calculator-dependent students losing arithmetic fluency, and autopilot-dependent pilots losing manual flight skills. These precedents are cited in the contemporary literature as evidence that the pattern is general across tool categories, not specific to generative AI.
Convergent evidence across domains. Medicine, education, knowledge work, and creative fields all show the same pattern with varying effect sizes.
Performance-capability decoupling is measurable. Standard assessment under tool-available conditions misses the deficit; tool-free assessment reveals it.
Clinically significant effect sizes. The endoscopist data and similar findings translate to outcomes with real human consequences at population scale.
Subjective experience misleads. Workers consistently report empowerment while external measurement shows deskilling — a metacognitive failure the performance-learning distinction explains.
Framework confirmation. The evidence validates predictions the deliberate practice framework generated from first principles before the tools that would test them existed.