Smuggled expertise names the structural feature of any AI system trained on expert-generated data: the system's performance incorporates the judgment of the humans whose work constitutes the training corpus. Medical records written by physicians, legal briefs drafted by lawyers, code written by engineers — in each case, the data is not raw observation but the product of human cognition. Every data point reflects a clinical decision, an engineering judgment, a lawyer's assessment of relevance. When the AI is then evaluated against human experts, the comparison is structurally unfair: the system is being measured against the people whose judgment it has already absorbed. Klein identifies this problem as one of three methodological failures that make claims of AI superiority over experts systematically unreliable, the other two being learning confounds and big-data intimidation.
Klein's February 2024 essay dissecting an emergency department prediction study made the concept operational. The algorithm was trained on electronic health records — records that contained the observations, assessments, and clinical decisions of the ED physicians and nursing staff. The algorithm was not reading the patients. It was reading what the clinicians had written about the patients. The algorithm's predictions were built on a foundation of clinical expertise that the study's design rendered invisible.
The concept has sustainability implications that extend far beyond methodological critique. If AI performance depends, to a degree difficult to quantify but impossible to deny, on the expertise embedded in training data, then the relationship between AI and human expertise is parasitic in a precise technical sense. The AI feeds on the expertise it appears to replace. Its performance is a function of the quality and richness of the human judgment that produced the data from which it learned.
The parasitic relationship creates a problem that compounds over time. If the deployment of AI systems reduces the number of human experts practicing in a domain — if it displaces the physicians whose notes feed the algorithm, the engineers whose code trains the model — then the quality of training data degrades. The expertise the system consumed is not being regenerated, because the conditions under which it was generated are being eliminated by the system itself. The reservoir is deep but finite.
The connection to Segal's ascending friction and The Orange Pill's central claim that AI amplifies whatever signal it is given becomes more troubling in this light. If the signal being amplified includes the accumulated expertise of practitioners whose work generated the training data, then the amplification is not creating new intelligence. It is redistributing existing intelligence from the experts who generated it to the systems that extracted it, without the experiential mechanisms that allow the expertise to regenerate.
Klein developed the concept in response to a wave of studies claiming AI superiority over human experts — studies whose methodological design systematically obscured the degree to which AI performance depended on human-generated training data. The pattern was sufficiently consistent across studies that Klein treated it as a diagnostic category rather than an individual methodological slip.
The framework connects to longer-standing critiques in philosophy of science about the theory-ladenness of observation and the expert-dependence of classification systems, but Klein's contribution was translating these critiques into operational diagnostic questions that practitioners could apply to specific AI performance claims.
Training data is not raw observation. It is the product of expert judgment — selection, interpretation, classification — that the algorithm inherits without acknowledgment.
Comparison is structurally unfair. AI is measured against experts whose judgment it has already absorbed through the training data.
Parasitic dependence. AI performance depends on ongoing human expertise for the quality of training data.
Sustainability problem. If AI deployment reduces human experts practicing in a domain, the training data reservoir is consumed without replacement.
Making the invisible visible. Organizations must explicitly ask where training data expertise came from and whether its generation is being maintained.
AI researchers have responded to the smuggled expertise critique in several ways. Some argue that the critique conflates input data with learned capabilities — that the model extracts patterns that go beyond what any individual expert contributed. Klein's counter is that the extracted patterns are still shaped by the aggregate judgment of the experts whose work formed the corpus, and that this aggregate expertise is itself vulnerable to the sustainability problem. Others argue that reinforcement learning from human feedback makes the dependence explicit and manageable; Klein's concern is that RLHF annotators are themselves a form of expert input whose conditions of work and cognitive state shape the resulting system.