Smuggled Expertise and the Training-Data Problem — Orange Pill Wiki
CONCEPT

Smuggled Expertise and the Training-Data Problem

The structural illusion by which AI systems appear to possess expertise they have extracted from human experts — the representations manifest in training data without any of the developmental process that built them.

When an AI system produces expert-level output, the natural inference is that the system possesses expertise. The Ericsson framework exposes this as a category error. The system produces output that matches expert output because it was trained on the products of human expert cognition — code written by developers who underwent deliberate practice, briefs drafted by lawyers who read cases attentively, diagnoses made by physicians who built pattern-recognition through years of clinical experience. The system extracted statistical patterns from these products. It did not develop the representations that produced them. The human judgment embedded in the training data is smuggled into the system's output, where it appears to be the system's own competence. The distinction matters because smuggled expertise has no developmental trajectory — it cannot grow, cannot adapt to genuinely novel situations, and cannot recognize when the patterns it has absorbed are inadequate to the problem at hand.

In the AI Story

Hedcut illustration for Smuggled Expertise and the Training-Data Problem
Smuggled Expertise and the Training-Data Problem

The framework parallels Gary Klein's diagnostic concept of smuggled expertise: the human judgment embedded in AI training data that the system then appears to have generated itself. The structural reason AI-versus-expert comparisons are methodologically unfair: the AI is given learning opportunities the human experts are denied. Every excellent piece of code in the training corpus reflects the developmental work of the developer who wrote it. The system absorbs the pattern. The pattern was built through struggle. The struggle was the mechanism by which the representation formed. The system has the pattern without the mechanism.

This has specific consequences for the future of expertise. If AI systems continue to train on human expert output, they depend on a continuing supply of that output — which depends on a continuing population of humans undergoing the developmental process. If AI-assisted production replaces the developmental process, the supply degrades. The system's training data increasingly reflects tool-assisted rather than representation-grounded production. Model collapse is the information-theoretic framing of this problem; the Ericsson framework provides the developmental framing: the system is consuming the substrate that produced it and not replenishing the substrate.

The problem is self-reinforcing. Each cohort of practitioners who use AI tools to produce output becomes one whose own cognitive architecture is thinner than the prior cohort's. Their tool-assisted production enters the training data. The next generation of AI systems trains on this thinner output. The systems themselves become thinner in the depth they can produce. The circle closes on a civilization that has automated the conditions for its own cognitive maintenance — producing excellent output in the short term and eroding the representational substrate that the output secretly depends on.

Origin

The concept is implicit in Ericsson's mature framework and made explicit in Klein's critiques of AI-versus-expert comparisons. The framework's application to the AI training problem draws on information-theoretic accounts of model collapse and on emerging empirical work documenting the dependence of AI systems on human-generated training data.

Key Ideas

Output ≠ expertise. AI systems produce outputs matching expert human outputs without possessing the representations that produced them.

Representations are not transferable. The patterns absorbed through training cannot be transferred back to human practitioners who skipped the developmental process that built them.

Substrate dependence. AI systems depend on a continuing supply of representation-grounded human expert output, which requires a continuing population of practitioners undergoing deliberate practice.

Invisible collapse risk. If AI-assisted production replaces the developmental process, the training substrate degrades in ways not immediately visible in system outputs.

Evaluation asymmetry. AI-versus-human comparisons systematically favor the AI because the AI has absorbed what the human developed.

Appears in the Orange Pill Cycle

Further reading

  1. Klein, Gary. Sources of Power: How People Make Decisions (MIT Press, 1998).
  2. Shumailov, Ilia, et al. The Curse of Recursion: Training on Generated Data Makes Models Forget (2023).
  3. Ericsson, K. Anders. Deliberate Practice and the Modifiability of Body and Mind (2007).
  4. Lanier, Jaron. You Are Not a Gadget (Knopf, 2010), on the invisibility of human labor inside machine outputs.
Part of The Orange Pill Wiki · A reference companion to the Orange Pill Cycle.
0%
CONCEPT