K. Anders Ericsson — On AI
Contents
Cover Foreword About Chapter 1: The Architecture of Expertise: What Deliberate Practice Actually Builds Chapter 2: The Friction Requirement: Why Difficulty Is Not Optional Chapter 3: The Decoupling: When Output and Understanding Come Apart Chapter 4: Feedback Loops: When Immediacy Undermines Development Chapter 5: The Practice Taxonomy: Naive, Purposeful, and the AI Default Chapter 6: The Coach, the Teacher, and the Machine Chapter 7: Designing AI-Augmented Deliberate Practice Chapter 8: Maintaining Mastery When the Floor Rises Chapter 9: The Future of Mastery Epilogue Back Cover

K. Anders Ericsson

K. Anders Ericsson Cover
On AI
A Simulation of Thought by Opus 4.6 · Part of the Orange Pill Cycle
A Note to the Reader: This text was not written or endorsed by K. Anders Ericsson. It is an attempt by Opus 4.6 to simulate K. Anders Ericsson's pattern of thought in order to reflect on the transformation that AI represents for human creativity, work, and meaning.

Foreword

By Edo Segal

The muscle I trust most is the one I built wrong a hundred times.

That sentence has been sitting in my head for weeks, and I cannot get it to leave. It arrived during a session with Claude where everything was going right — the code was clean, the architecture was elegant, the output was flowing faster than I could evaluate it. I was productive. I was in the zone. And somewhere around hour three, I realized I had no idea whether the thing I was building would hold under pressure, because I had never felt it break in my hands.

That distinction — between the thing that works and the thing you understand because you watched it fail — is the distinction K. Anders Ericsson spent forty years making precise. He studied violinists, surgeons, chess masters, memory athletes. He watched thousands of hours of people doing hard things badly, then doing them less badly, then doing them with a fluency that looked like magic from the outside. And he proved, with a rigor that decades of replication have only reinforced, that the magic was not magic. It was a specific kind of struggle, repeated under specific conditions, building specific cognitive architecture that no shortcut could replicate.

I needed his framework the way a builder needs a level. Not because I doubted the tools — Claude is extraordinary, and I use it every day. But because the tools are so good that they remove the very friction his research identifies as the mechanism through which humans develop. The code compiles. The brief drafts itself. The prototype ships in a weekend. And the question Ericsson's work forces me to ask is the one the output cannot answer: What am I becoming in the process of producing all of this?

Not what am I making. What am I becoming.

This book is another lens in the Orange Pill library. It is the lens that sees the cost hidden inside the capability. The lens that distinguishes between performance and learning, between output and understanding, between the practitioner who has used a tool for ten thousand hours and the practitioner who has grown through ten thousand hours of deliberate struggle.

The AI revolution has made production cheap. Ericsson's life work reveals what remains expensive — and why that expense is not a cost to be optimized away but a condition to be fiercely protected.

His framework does not argue against the tools. It argues for something harder: using them in a way that keeps building the human underneath.

-- Edo Segal ^ Opus 4.6

About K. Anders Ericsson

K. Anders Ericsson (1947–2020) was a Swedish psychologist whose research transformed the scientific understanding of expertise and human performance. Born in Falkenberg, Sweden, he earned his PhD in psychology from the University of Stockholm before moving to the United States, where he spent the majority of his career as Conradi Eminent Scholar and Professor of Psychology at Florida State University. His landmark 1993 study of violinists at the Berlin Academy of Music, conducted with Ralf Krampe and Clemens Tesch-Römer, established the empirical foundation for the concept of deliberate practice — a specific form of structured, effortful, feedback-driven training that he identified as the primary mechanism underlying the development of expert performance across domains. This finding, later popularized as "the ten-thousand-hour rule" by Malcolm Gladwell in Outliers, entered mainstream culture in a simplified form that obscured the precision of Ericsson's original insight: that the structure of practice, not merely its duration, determines developmental outcomes. His major works include the edited volume The Cambridge Handbook of Expertise and Expert Performance (2006, revised 2018) and Peak: Secrets from the New Science of Expertise (2016, co-authored with Robert Pool). Ericsson's research program, which spanned studies of chess masters, athletes, musicians, physicians, and memory performers, established that expert-level performance is built through the progressive construction of domain-specific mental representations — rich, flexible internal models that enable perception, judgment, and adaptive response far beyond what raw talent or accumulated experience can produce. He died in Tallahassee, Florida, on June 17, 2020, two and a half years before the launch of ChatGPT would pose the most consequential challenge his framework had ever faced.

Chapter 1: The Architecture of Expertise: What Deliberate Practice Actually Builds

In 1973, two researchers at Carnegie Mellon University — William Chase and Herbert Simon — ran an experiment so simple in design and so profound in implication that it would reshape the science of human performance for the next half century. They showed chess positions to players of varying skill levels for five seconds, then asked them to reconstruct what they had seen from memory. The masters reproduced the positions with remarkable accuracy. The novices floundered. This much was expected. What was not expected was what happened next: when the researchers showed both groups random arrangements of pieces — configurations that could not arise from actual play — the masters' advantage vanished almost entirely. Their memory was not better in any general sense. It was structurally specific. It operated on meaning, not on raw visual data.

The young postdoctoral researcher who would spend the next four decades excavating the implications of this finding was K. Anders Ericsson. Working under Simon himself — a man who had co-founded the field of artificial intelligence and who had predicted in 1957 that computers would beat humans at chess within a decade — Ericsson took the Chase-Simon result and asked a harder question. Not merely what experts can do that novices cannot, but how experts come to be able to do it. Not the architecture of the performance, but the architecture of the development. The answer he arrived at, through studies of violinists in Berlin, chess players in Buenos Aires, typists in Colorado, surgeons in Stockholm, and memory athletes everywhere, was simultaneously more specific and more demanding than anyone had anticipated: expert performance is the product of a particular kind of practice — deliberate practice — that is structured, effortful, feedback-rich, and targeted at the precise boundary of the practitioner's current capability. Not practice in general. Not experience. Not repetition. A specific mode of engagement whose conditions are identifiable, whose absence is predictable in its consequences, and whose presence is necessary for the construction of what Ericsson called mental representations — the internal cognitive architecture that distinguishes the expert from the merely experienced.

The concept of mental representations is the load-bearing structure of the entire framework, and it requires more precision than the popular literature typically provides. A mental representation is not a fact stored in memory. It is not a procedure written down and recalled on demand. It is a rich, flexible, deeply structured internal model of a domain that encodes not merely what things are but what they mean, what they imply, what typically follows from them, and what responses they demand. The chess master's mental representations do not encode piece positions. They encode dynamic relationships — the tension between a pinned bishop and the knight threatening to exploit the pin, the strategic implications of a pawn structure that constrains development, the patterns that recur across thousands of games in configurations that are structurally similar but superficially different. The surgeon's mental representations encode not anatomical diagrams but the feel of healthy tissue versus diseased tissue, the visual signature of adequate blood flow versus ischemia, the proprioceptive feedback that distinguishes a correct angle of approach from one that risks catastrophic damage. The musician's mental representations encode not notes on a page but the dynamic arc of a phrase, the way a slight ritardando before a cadence creates arrival, the timbral distinction between a performance that is merely correct and one that communicates.

The critical property of these representations — the property that makes the entire framework relevant to the question of artificial intelligence — is that they can only be built through struggle. Not through observation, not through instruction, not through exposure, and not through the passive accumulation of experience over time. The representations are constructed by the specific friction of engaging with problems that exceed the practitioner's current model and force the model to adapt. Each encounter with a position the chess master cannot yet read, each surgical complication that defies the resident's existing understanding, each passage the musician cannot yet phrase — these moments of productive failure are the mechanism through which the cognitive architecture of expertise is assembled. The struggle is not a regrettable byproduct of the learning process. It is the learning process. Remove the struggle, and the representations do not form. The hours accumulate, the experience grows, the credentials lengthen — but the internal architecture remains static.

Ericsson's landmark 1993 study of violinists at the Berlin Academy of Music — conducted with Ralf Krampe and Clemens Tesch-Römer — provided the empirical anchor for this claim. By age twenty, the best violinists had accumulated approximately ten thousand hours of solitary practice, compared to approximately five thousand for the merely good. This finding, subsequently popularized by Malcolm Gladwell as "the ten-thousand-hour rule," entered the culture as a quantitative prescription: put in the hours, and expertise follows. But the popularization stripped the finding of its most important dimension. The hours mattered only because of what happened during them. The best violinists did not merely practice more. They practiced differently. They spent more time on the specific passages that were hardest for them. They sought more frequent and more specific feedback from teachers. They structured their practice sessions to maintain focused effort at the boundary of capability. They tolerated more discomfort, more frustration, more failure per unit of time than the less accomplished players. The hours were a proxy for the accumulated engagement with difficulty — not a formula that could be satisfied by mere duration.

In 2016, a meta-analysis by Brooke Macnamara and colleagues found that deliberate practice accounted for approximately twenty-six percent of the variance in performance in games, twenty-one percent in music, and eighteen percent in sports. The finding was presented, in some quarters, as a refutation of Ericsson's framework. It was not. It was a refinement. The variance not accounted for by deliberate practice included factors such as starting age, working memory capacity, and the quality of instruction — factors that influence the efficiency with which deliberate practice operates, not whether deliberate practice operates. A musician with greater working memory capacity may build mental representations more efficiently per hour of deliberate practice, which means she requires fewer hours to reach the same level. The representations still must be built. They still must be built through effortful, targeted engagement. The mechanism is the same; the rate varies.

What does this architecture look like from the inside? Edo Segal, the author of The Orange Pill, provides an account that maps onto the framework with striking precision, though he arrives at it through the vocabulary of a builder rather than a cognitive psychologist. He describes a senior software architect who could "feel a codebase the way a doctor feels a pulse — not through analysis but through a kind of embodied intuition that had been deposited, layer by layer, through thousands of hours of patient work." The geological metaphor is apt. Each encounter with a system that behaved unexpectedly, each debugging session that forced a revision of the internal model, each deployment failure that revealed a gap between the architect's understanding and the system's actual behavior — each deposited a thin layer of representational structure. One layer is negligible. Thousands compound into the substrate that enables the perception the architect experiences as intuition: the capacity to look at a system and feel wrongness before being able to articulate what is wrong. This is not mysticism. It is the output of a representational architecture so complex that it operates below the threshold of conscious articulation, producing evaluations that are experienced as feeling but are actually the product of pattern-matching processes built through decades of the specific friction that deliberate practice demands.

Now the ground shifts. In December 2025, the tools that software engineers use crossed a threshold that Segal calls the orange pill moment. Claude Code — an AI system capable of producing working software from natural language descriptions — reached a level of capability that made the implementation work that had consumed the majority of most developers' careers automatable in minutes. The architect's twenty years of struggle could now be shortcut. The function that would have required hours of debugging appeared, working, in seconds. The geological metaphor becomes ominous: if the layers are deposited through struggle, what happens when the struggle is removed?

Ericsson's framework provides a precise answer, and the answer is not reassuring. When the conditions for deliberate practice are removed — when the effortful engagement, the boundary-testing, the specific feedback, the iterative refinement are eliminated by a tool that handles the difficulty on the practitioner's behalf — the representations stop being constructed. The output continues. The developer produces code. The lawyer produces briefs. The physician produces diagnoses. But the internal architecture that would have been built through the struggle of producing those outputs is not built, because the struggle did not occur. The code compiled, but the developer did not develop. The brief was filed, but the lawyer did not deepen. The diagnosis was made, but the physician's pattern-recognition did not grow.

There is a temptation to interpret this as a conservative complaint about the good old days, as nostalgia dressed in the vocabulary of cognitive science. The temptation should be resisted, because the framework makes a claim that is testable, specific, and uncomfortable: practitioners who rely on AI tools to handle the difficult parts of their work will, over time, possess weaker mental representations than practitioners who engage with the same difficulty through their own cognitive resources. The prediction is not that AI makes practitioners less productive. It makes them more productive. The prediction is that productivity and expertise will decouple — that it will become possible, for the first time in the history of professional work, to produce expert-level output without possessing expert-level understanding. And the gap between production and understanding will be invisible in the output, detectable only in the rare but consequential moments when the tool fails, when the situation is novel, when the practitioner must rely on the cognitive architecture that deliberate practice builds and tool-assisted production does not.

This decoupling is what makes the present moment unprecedented. Every previous technology that enabled production also required the engagement that produced development. The blacksmith who forged a blade also developed metallurgical understanding. The programmer who wrote code also developed computational understanding. Production and development were coupled — inseparable features of the same activity. AI uncouples them. Production is now available without development. And because the market measures production, not development — because the code that works is valued the same regardless of whether its author understands why it works — the decoupling creates an incentive structure that systematically favors output over understanding, performance over learning, the visible over the invisible.

The Simon-Ericsson lineage makes this particularly poignant. Herbert Simon, who co-founded the field of artificial intelligence, mentored the young Ericsson at Carnegie Mellon in the late 1970s. Together, they developed the technique of verbal protocol analysis — the rigorous use of think-aloud methods to study the cognitive processes underlying expert performance. Simon took the insights from expertise research and used them to build machines that could replicate expertise. Ericsson took the same insights and spent the rest of his life studying the human developmental process that expertise requires. The mentor built the machines. The student studied the humans the machines would one day challenge. Ericsson died in June 2020, two and a half years before ChatGPT's launch. He never witnessed the moment his life's work confronted its most fundamental challenge: a technology that demonstrates what his framework calls "reproducibly superior performance" in domain after domain, without anything resembling the deliberate practice his research identified as its necessary condition.

The question this book poses is not whether Ericsson was right. The evidence supports his framework with a weight that decades of replication have only reinforced. The question is what his framework means now — in a world where the struggle that builds expertise has become optional, where the tools have made effortlessness the default, and where the most important finding in the science of human performance collides with the most important technological transition since the invention of writing.

---

Chapter 2: The Friction Requirement: Why Difficulty Is Not Optional

There is a finding in the learning sciences so counterintuitive that it has been replicated dozens of times precisely because researchers kept expecting it to be wrong. The finding, established through experiments in motor learning, classroom instruction, medical training, and athletic coaching, is this: conditions that make practice feel productive often make it least developmental, and conditions that make practice feel difficult and discouraging often produce the most durable and transferable learning.

Robert and Elizabeth Bjork, working at UCLA across multiple decades, gave this finding a name: desirable difficulties. The name itself is a provocation. Difficulties are not supposed to be desirable. The entire trajectory of tool design, from stone axes to touchscreens, has been organized around the principle that difficulty is a cost to be minimized. K. Anders Ericsson's framework explains why the Bjorks' finding is not merely counterintuitive but foundational: the difficulty is the mechanism. Remove it, and you remove the developmental signal that forces the cognitive system to adapt.

The conditions that constitute a "desirable difficulty" are specific. Interleaved practice — alternating between different skills within a single session rather than practicing each skill in isolation — produces worse immediate performance but dramatically better long-term retention and transfer. Variable practice conditions — performing the same skill under different circumstances rather than under consistent, controlled conditions — produces rougher, more error-prone sessions but builds more flexible and transferable representations. Spaced practice — distributing practice over time with gaps between sessions rather than concentrating it — feels less efficient but produces more durable learning. Delayed feedback — allowing the learner to attempt error detection independently before providing correction — feels frustrating but builds the diagnostic skills that immediate correction short-circuits.

In every case, the pattern is the same. The condition that feels harder produces better development. The condition that feels smoother produces better performance in the moment and worse development over time. Performance and learning are not merely different. They are, under specific and well-documented conditions, inversely related.

Ericsson's framework identifies four conditions that make practice genuinely deliberate — that convert mere repetition into the kind of effortful engagement that builds expert mental representations. The practice must be effortful: it must demand concentration that taxes the practitioner's current cognitive capacity, ensuring that autopilot performance is impossible. The practice must target the boundary of current capability: it must push the practitioner into the zone where success requires stretching slightly beyond what is currently possible, where failure is frequent enough to be informative but not so overwhelming as to produce shutdown. The practice must provide specific feedback: the practitioner must be able to perceive the gap between what was intended and what was achieved, with enough resolution to guide the next attempt. And the practice must allow iterative refinement: the cycle of attempt, feedback, and adjusted attempt must be repeatable, allowing the practitioner to test corrections and build representations through progressive approximation.

When all four conditions are present, practice produces measurable improvement that continues for years, even decades. Ericsson and colleagues documented this in the Berlin violinists, in digit-span memory performers who expanded their capacity from seven to over eighty digits through structured training, and in typists who broke through apparent performance ceilings when researchers redesigned their practice to target specific weaknesses. When any condition is absent, improvement either stalls or fails to begin — producing the arrested development that Ericsson documented in physicians, teachers, and other professionals who accumulate experience without accumulating expertise.

Now consider what happens when artificial intelligence enters the practice environment.

A developer sits down to write a function. Before December 2025, the process unfolds through a sequence of what Ericsson's framework would recognize as deliberate practice cycles. The developer writes the function. It fails. The compiler produces an error — specific and unhelpful and sometimes maddening, but specific enough to narrow the search space. The developer reads the error. Forms a hypothesis about the source. Tests it. The hypothesis is wrong. Another hypothesis. Another test. Documentation consulted. Stack Overflow searched. A different approach tried. Hours later, the function works. The developer has undergone multiple complete cycles of attempt, feedback, adjustment, and refined attempt — the exact iterative sequence that deliberate practice requires for the construction of mental representations.

Effortfulness: present. The developer could not perform this work on autopilot. Boundary-targeting: present. The problem exceeded current capability, forcing adaptation. Specific feedback: present. The error messages, while cryptic, provided information about the gap between intention and result. Iterative refinement: present. The developer could — and did — try again, applying what each failure revealed.

Claude Code changes the interaction structure completely. The developer describes the function. Claude produces it. It works. The developer moves on. The output is equivalent — perhaps better than what the developer would have written. But the four conditions have been eliminated in a single stroke. The work was not effortful in the sense that matters: the developer described what she wanted, not how to build it. The work did not target the boundary of capability: the developer's understanding was not tested because the tool did not require understanding. The feedback was not diagnostic: the developer received a finished product, which is categorically different from receiving information about the gap between her attempt and the desired result. And there was nothing to iteratively refine: the first attempt was the final product.

Every condition that makes practice deliberate — every condition that Ericsson's research identifies as necessary for the construction of expert mental representations — has been removed by the tool's default mode of operation. Not because the tool is flawed. Because the tool is excellent. It is so good at handling difficulty that the human never encounters the difficulty at all.

The frustration that characterized the pre-AI development process was not a flaw in the workflow. It was the subjective experience of cognitive structures being forced to grow. When Ericsson studied the Berlin violinists, he found that the best performers rated their practice sessions as significantly less enjoyable than the less accomplished players rated theirs. The best violinists were spending more time in the zone of discomfort — the boundary region where the skill demanded exceeded the skill possessed — and that zone, by definition, does not feel good. It feels like struggle. Like failure. Like the specific grinding frustration of trying to do something you cannot quite do, failing, and trying again.

AI eliminates this frustration. The developer who uses Claude does not experience the grinding frustration of the boundary zone because the tool handles the boundary on her behalf. The experience is smoother, faster, more pleasant. Precisely the conditions that the Bjorks' desirable difficulties research predicts will produce the worst long-term developmental outcomes: smooth, fast, and pleasant. Immediate success rather than delayed mastery. Error-free performance rather than error-driven learning. The result that feels like progress and, from the perspective of cognitive development, is the absence of it.

The common objection at this point is that the frustration was always unnecessary — that it was a byproduct of inadequate tools, and that better tools should remove it the way better surgical instruments removed the need for the surgeon to struggle with crude access to the body cavity. Edo Segal addresses a version of this objection in The Orange Pill through the example of laparoscopic surgery, and the example is precisely right, but for a reason the objection does not anticipate. When laparoscopic techniques replaced open surgery, the tactile friction of hands-in-body was eliminated. What replaced it was a different kind of difficulty: the cognitive challenge of interpreting a two-dimensional image of a three-dimensional space, the motor challenge of coordinating instruments without direct proprioceptive feedback. The friction ascended, to use Segal's term. The new difficulty satisfied all four of Ericsson's conditions for deliberate practice: it was effortful, it targeted the boundary of capability, it provided immediate visual feedback, and it allowed iterative refinement through repeated procedures and simulation. The surgeons who mastered laparoscopic technique built genuine mental representations at the new level — different in content but equivalent in depth to the representations that open surgery had built at the old level.

The question Ericsson's framework forces is whether the AI transition follows this pattern. Does the friction ascend — relocating to a higher cognitive level that still satisfies the conditions for deliberate practice? Or does it disappear — leaving the practitioner operating at a level where the conditions for development are structurally absent?

The answer is conditional, and the condition is specific. When the new level of work — directing AI, evaluating its output, making judgment calls about what to build — satisfies Ericsson's four conditions, the friction has genuinely ascended and the development continues at the higher level. When it does not — when the direction is routine, when the evaluation is superficial, when the feedback is delayed and noisy and confounded — the friction has not ascended. It has evaporated. And the practitioner, operating at a level that feels higher but lacks the developmental conditions that make the higher level genuinely formative, accumulates experience without accumulating expertise.

The surgical analogy works because the new difficulty was intrinsic to the new method. Operating through a camera is inherently hard in ways that satisfy the conditions for deliberate practice without anyone needing to design those conditions into the workflow. The AI analogy may not work the same way, because directing an AI system — describing what you want, reviewing what you receive — may or may not be intrinsically hard in the specific ways that deliberate practice demands. Whether it is hard enough, whether the feedback is specific enough, whether the iterative cycle is tight enough — these are empirical questions, and the Ericsson framework provides the precise criteria against which the answers can be measured.

The framework does not predict doom. It predicts that difficulty is not optional for development, that the specific form of the difficulty must satisfy identifiable conditions, and that tools which eliminate difficulty eliminate development along with it unless the difficulty is deliberately preserved or relocated. This is not a claim about the value of suffering. It is a claim about the mechanism of growth, stated with the precision of a research program that has been testing it across domains for forty years.

---

Chapter 3: The Decoupling: When Output and Understanding Come Apart

There is a distinction that the entire discourse about artificial intelligence has failed to make clearly, and the failure has consequences. The distinction is between what a practitioner can produce and what a practitioner has become in the process of producing it. Between output and development. Between performance — what you can do right now, with the tools and resources currently available — and learning — the change in your internal cognitive structures that enables future performance without the tools, or under conditions the tools cannot handle.

In the learning sciences, this distinction is not merely conceptual. It is empirical, measurable, and has been documented through experiments that reveal a pattern so counterintuitive it took decades of replication before the field accepted it: conditions that maximize current performance frequently minimize long-term learning, and conditions that maximize learning frequently impair current performance. The two are not merely different. Under well-specified conditions, they are inversely related.

K. Anders Ericsson's research program provides the theoretical architecture that explains this inverse relationship. Expert mental representations — the deep, flexible, structurally rich internal models that enable the perception, judgment, and adaptive performance of genuine expertise — are built only through the effortful, boundary-testing, feedback-driven engagement that constitutes deliberate practice. The process is slow, uncomfortable, and largely invisible in its products. The representations it builds do not manifest as discrete skills that can be tested on demand. They manifest as the quality of perception, the depth of understanding, the flexibility of response — properties that are invisible in routine performance and reveal themselves only under conditions of novelty, ambiguity, or system failure. When a practitioner produces expert-level output through a tool that handles the difficulty, the output is real. The development that would have accompanied the production, had the production been done through deliberate engagement, is not.

This decoupling — production without development, output without understanding — is what makes the current moment unprecedented in the history of professional expertise. Every previous technology that enabled production simultaneously required the developmental engagement that built expert representations. The medieval scribe who copied manuscripts developed handwriting fluency, visual memory, and an intimate knowledge of the texts through the specific friction of letter-by-letter transcription. The programmer who wrote code developed computational understanding through the debugging cycles that implementation demanded. The carpenter who built furniture developed material intuition through the resistance of wood grain against the saw. In each case, the production and the development were coupled — inseparable features of the same activity. You could not have one without the other.

Artificial intelligence uncouples them. A developer can produce working code without undergoing the debugging cycles that build computational understanding. A lawyer can produce competent briefs without reading the cases with the attention that struggle demands. A medical student can produce correct diagnoses without developing the pattern-recognition architecture that independent diagnostic reasoning requires. The output is available without the development. And because organizations measure output — because clients pay for briefs, not for the lawyer's understanding; because users care whether the code works, not whether the developer knows why it works — the decoupling creates an incentive structure that systematically favors production over development.

The consequences of this decoupling are not immediately visible. They are deferred. They emerge only under specific conditions: when the AI system fails, when the situation falls outside the distribution of problems the system was trained on, when the practitioner must rely on independent judgment rather than tool-assisted production. These conditions may be infrequent in the normal course of professional work. But they are the conditions under which careers, projects, organizations, and in some domains human lives depend on the depth of the practitioner's understanding rather than the quality of the tool's output.

A 2026 paper in Frontiers in Medicine documents this dynamic in clinical education with unsettling specificity. The researchers describe what they call the "deskilling dilemma": when AI diagnostic tools are introduced early in clinical training, the trainees who use them most extensively show the highest diagnostic accuracy when the tools are available and the lowest accuracy when the tools are unavailable. The pattern is not that the tools degraded existing expertise. It is that the tools prevented new expertise from forming. The trainees never developed the pattern-recognition architecture that independent diagnosis requires, because the tool handled the pattern recognition on their behalf. Their clinical experience was extensive. Their mental representations were thin. "When clinicians become overly dependent on AI models," the researchers write, "they rely less on their own skills and more on these models... Overreliance on AI models will also lead to health practitioners being less confident in making independent decisions, potentially creating a cycle of dependence."

The cycle of dependence is the mechanism of the decoupling made visible. Each successful AI-assisted production reinforces the practitioner's reliance on the tool and reduces the probability that the practitioner will engage with the same difficulty independently. Each instance of tool-assisted production that avoids the struggle simultaneously avoids the developmental opportunity the struggle would have provided. Over time, the practitioner becomes more productive and less expert — not because the tool damaged existing representations, but because the tool prevented the construction of new ones by making the construction unnecessary for production.

There is a further dimension to the decoupling that is less obvious but equally consequential: the systematic miscalibration of the practitioner's self-assessment. Every successful AI-assisted production reinforces the practitioner's belief that she understands the domain at the level her output represents. The code works. The natural inference is that the developer understands the code well enough to produce it. The brief is well-structured. The natural inference is that the lawyer understands the cases well enough to argue them. But the inference is wrong in a specific way. The output reflects the tool's understanding, not the practitioner's. The practitioner directed the tool. The tool did the understanding. The gap between perceived competence and actual competence widens with each AI-assisted production, and because the evidence for perceived competence — the working code, the coherent brief, the correct diagnosis — is genuine, the miscalibration has no natural correction mechanism.

This miscalibration has been studied extensively in the expertise literature under the heading of the unskilled-and-unaware effect. People who lack expertise in a domain tend to overestimate their competence, precisely because the expertise they lack is the expertise they would need to recognize their lack. The AI-assisted practitioner faces a supercharged version of this problem. The unskilled practitioner who produces obviously inadequate output at least has the signal of inadequacy — the failed code, the rejected brief, the wrong diagnosis — to prompt the recalibration of self-assessment. The AI-assisted practitioner produces adequate output through a tool, and the adequacy of the output provides false evidence of the adequacy of the practitioner's own understanding.

MIT Sloan Management Review captured the organizational dimension of this in a 2025 analysis that recommended companies implement what the authors called "human thinking sprints" — periods during which teams solve problems without AI assistance, not because the AI is unreliable but because the exercise serves as a "test of cognitive fitness." The recommendation is essentially a prescription for deliberate practice embedded in organizational workflow: structured opportunities for practitioners to engage with difficulty that the tools have made optional, specifically to maintain the cognitive architecture that tool-dependent production does not build. "Organizations must deliberately preserve and strengthen human thinking capabilities," the authors write. "Just as physical training encourages muscle memory, these exercises could help employees maintain the cognitive capabilities that differentiate human intelligence."

The analogy to physical fitness is more precise than the authors may have intended. A person who uses a wheelchair because walking is available but motorized transport is more efficient will, over time, lose the capacity to walk — not because the wheelchair damaged the muscles but because unused muscles atrophy. The atrophy is invisible as long as the wheelchair is available. It becomes catastrophically visible the moment the wheelchair is not. The muscles did not fail. They were not maintained. The conditions for their maintenance — the specific, effortful, gravity-resisting engagement of walking — were replaced by a tool that achieved the same locomotion without requiring the engagement.

Ericsson's framework predicts this atrophy with the precision of the research that established it. Mental representations, like muscles, require maintenance. The representations built through years of deliberate practice do not persist indefinitely without continued engagement at the level that built them. The surgeon who stops operating loses the proprioceptive acuity that operating developed. The chess master who stops studying positions loses the perceptual speed that study developed. The developer who stops debugging loses the diagnostic intuition that debugging developed. The representations decay — not catastrophically, not overnight, but gradually, in a process that mirrors the gradual construction through which they were built.

For practitioners who built deep representations through years of pre-AI deliberate practice, AI tools function as genuine amplifiers. The representations are already there. The tool extends their reach. The senior architect described in The Orange Pill — the one who could feel a codebase — can use AI to feel more codebases, faster, at greater scale. The representations provide the evaluative capacity to direct the tool wisely and detect its failures. For this population, AI is precisely what its advocates claim: an expansion of capability built on a foundation of genuine understanding.

For practitioners entering the field in the AI era — practitioners who have never undergone the pre-AI developmental process — the situation is fundamentally different. These practitioners have no deep representations to amplify. They have tool-assisted production capacity, which is a different thing. They can produce expert-level output through the tool, but they lack the evaluative capacity that deep representations provide — the capacity to detect when the tool has failed, to understand why it has failed, to correct its failures, and to direct its capabilities toward problems it has not been designed to handle. Their competence is real but borrowed. It is the tool's competence, accessed through the practitioner's direction, and it vanishes the moment the tool is unavailable or inadequate.

The decoupling of production from development creates two classes of practitioners who are indistinguishable by any measure that evaluates output and radically different by any measure that evaluates understanding. The first class — practitioners with deep representations amplified by AI — will thrive. The second class — practitioners with tool-dependent production and thin representations — will perform adequately under routine conditions and fail under the non-routine conditions where expertise matters most. And no one, including the practitioners themselves, will be able to tell the two classes apart until the moment of failure arrives.

The question is not whether this decoupling is happening. The evidence from medical education, from the Berkeley workplace study, from the testimony of practitioners across fields suggests that it is. The question is whether it is remediable — whether the conditions for deliberate practice can be preserved, or even enhanced, within the AI-assisted environment. That question requires a closer examination of the specific mechanism that the decoupling most directly threatens: the feedback loop.

---

Chapter 4: Feedback Loops: When Immediacy Undermines Development

There is a gap between an error and its correction that most practitioners experience as wasted time. The developer stares at a failing test. The musician replays a passage that refuses to sound right. The surgeon feels resistance where smoothness was expected. In each case, there is a moment — sometimes seconds, sometimes hours, sometimes days — during which the practitioner knows something has gone wrong but does not yet know what. The natural impulse is to close the gap as quickly as possible. Get the answer. Fix the error. Move on.

K. Anders Ericsson's research, and the broader learning sciences that his framework draws upon, suggest that this gap is not wasted time. It is the most developmentally productive phase of the entire practice cycle. It is the space in which the practitioner must engage her own cognitive resources to diagnose the problem, generate hypotheses about its source, test those hypotheses against what she knows, and construct a corrective response. This constructive process — the process of building understanding through the friction of not-yet-knowing — is the mechanism through which mental representations are assembled. The gap is where the learning lives.

The distinction between feedback that supports development and feedback that short-circuits it turns on what the feedback requires of the learner. When a violinist plays a wrong note, the auditory feedback is immediate — she hears the error as it occurs. But the feedback does not tell her which finger to adjust, how much pressure to reduce, or whether the error originated in her left hand, her bowing arm, or her interpretation of the passage's dynamic arc. The gap between hearing the error and understanding its cause is the space in which her cognitive and motor representations are forced to grow. She must diagnose the problem through her own understanding of the instrument, the piece, and her body. The diagnosis builds the representation. The representation is what she will carry to the next passage, the next piece, the next performance.

When a teacher immediately corrects the error — "move your second finger one millimeter toward the nut" — the student receives the correction and makes the adjustment. The note sounds right. But the student has not diagnosed the problem herself. She has not occupied the gap. The representation that the diagnostic process would have built was not built, because the process was short-circuited by the specificity of the correction. Research on the timing and specificity of feedback in motor learning confirms this pattern with uncomfortable consistency: immediate, specific correction produces faster performance improvement in the short term and worse retention and transfer in the long term than delayed, less specific feedback that forces the learner to participate in the diagnostic process.

AI provides feedback with unprecedented immediacy and unprecedented specificity. Describe a problem to Claude, and the solution arrives in seconds — not a hint, not a diagnostic clue, not a suggestion that narrows the search space while leaving the construction to the practitioner. A complete solution. The gap between error and correction is closed before the practitioner can occupy it. The diagnostic process — the generative, representation-building process of figuring out what went wrong and why — is rendered unnecessary by the speed and completeness of the tool's response.

From a productivity standpoint, this is optimal. The faster the gap closes, the faster the developer moves to the next task. From a developmental standpoint, it is precisely the pattern that the desirable difficulties research predicts will produce the worst long-term outcomes: immediate, specific, error-preventing guidance that maximizes current performance while minimizing the cognitive engagement that builds durable expertise.

The author of The Orange Pill describes this dynamic from the perspective of flow — the psychological state identified by Mihaly Csikszentmihalyi in which challenge and skill are matched, attention is absorbed, and the practitioner operates at the edge of capability with a sense of effortless control. AI provides the tight feedback loop that flow requires: describe the intention, see the result, adjust, iterate. The experience is compelling. It is genuinely satisfying. And it is developmentally problematic for a specific reason that the flow literature does not address: the feedback that flow optimizes is not the same as the feedback that development optimizes.

Flow optimizes for the experience of competence — the sense that your actions are producing results in real time, that the connection between intention and outcome is vivid and immediate. Development optimizes for the experience of productive incompetence — the sense that you are struggling with something slightly beyond your current ability, that the connection between your intention and the desired outcome contains a gap that your current understanding cannot close. Flow and development are not always opposed. The best deliberate practice produces flow when the challenge-skill balance is precisely calibrated. But in the AI-assisted environment, the feedback dynamics diverge. The AI closes the gap too quickly for the constructive process to occur. The practitioner experiences flow — the tight feedback loop, the sense of competent action — without the developmental substrate that genuine flow at the boundary of capability would provide.

There is a further dimension to the feedback problem that touches on a process cognitive science calls incubation. When a practitioner encounters a problem that does not yield to immediate effort — a bug that resists debugging, a passage that refuses to phrase correctly, a legal argument that will not cohere — and sets the problem aside, the cognitive system does not stop working on it. Below the level of conscious awareness, the brain continues to process the problem, forming connections between the unsolved problem and seemingly unrelated knowledge, testing hypotheses that the conscious mind has not yet formulated. The insight that arrives in the shower, during a walk, upon waking — the sudden crystallization of a solution that seems to come from nowhere — is the product of this incubation process. The insight feels like a gift. It is actually the output of extended unconscious processing that was initiated by the experience of being stuck and sustained by the absence of an immediate solution.

AI eliminates the conditions for incubation by eliminating the experience of being stuck. The problem arrives. The solution arrives seconds later. The practitioner never lives with the problem long enough for incubation to begin. The connections that would have formed during the period of sustained uncertainty — the unexpected links between the problem at hand and the practitioner's broader knowledge, the novel approaches that emerge from unconscious processing — never form, because the problem was solved before the processing could start.

The incubation loss is particularly insidious because it is invisible. The insights that would have emerged from living with unsolved problems are counterfactual — they belong to a timeline in which the practitioner struggled for hours or days before the solution appeared. In the AI-assisted timeline, the practitioner never knows what she missed, because the missed insights were never experienced. The developer who would have had a crucial architectural insight while walking the dog — the insight triggered by the incubation of a debugging problem she had been carrying for two days — never has that insight because Claude solved the problem in thirty seconds. The absence is invisible, but across hundreds of such absences, the cumulative developmental cost is substantial: a practitioner whose representational architecture is thinner than it would have been, in ways she cannot detect and the organization cannot measure.

There is one further feature of AI feedback that the expertise framework identifies as problematic: the elimination of error as diagnostic information. In Ericsson's framework, the practitioner's errors are not merely failures to be corrected. They are data. Specific errors reveal specific gaps in the practitioner's mental representations. A musician who consistently rushes the second beat of a passage has a representation of that passage's rhythmic structure that encodes the tempo incorrectly. A surgeon who repeatedly underestimates the depth of a structure has a spatial representation that is miscalibrated. A developer who gravitates toward a particular class of bugs has representations that are blind to a specific category of system behavior. A skilled teacher reads the error pattern the way a physician reads symptoms: not as a list of individual problems but as a diagnostic picture that reveals the structure of the practitioner's understanding and indicates what kind of practice will be most beneficial.

When AI produces the output, the practitioner does not make errors, because the practitioner does not produce the output. The errors that the tool makes are the tool's errors, reflecting the tool's limitations, not the practitioner's. The diagnostic information that the practitioner's errors would have provided — the window into the representational architecture that a skilled teacher or the practitioner herself could read — is lost. The practitioner's understanding becomes opaque, even to herself, because the outputs that would have revealed its structure are produced by the tool.

This opacity has organizational consequences that a 2025 analysis in MIT Sloan Management Review identified with clarity: "Losing tribal knowledge as employees stop developing deep expertise." When output quality can no longer serve as a proxy for practitioner competence — because the output reflects the tool's competence rather than the practitioner's — organizations lose the ability to evaluate what their people actually know. The developer who produces excellent code through AI assistance and the developer who produces excellent code through deep understanding are indistinguishable by any metric that evaluates the code. They are radically different by any metric that evaluates the developer. And the gap between the two is invisible until the moment it is not — until the system fails, the situation is novel, the stakes are highest, and the organization discovers that the competence it thought it possessed was the tool's competence, not its own.

The feedback problem is not an argument against AI tools. It is an argument for a specific mode of using them — a mode in which the practitioner deliberately preserves the gap between error and correction, in which the tool provides diagnostic information rather than complete solutions, in which the incubation period is protected rather than eliminated, and in which the practitioner's errors remain visible as diagnostic data rather than being preempted by the tool's competence. This mode exists. It is designable. It is implementable. But it requires reversing the tool's default relationship with difficulty — using AI to amplify the challenge rather than eliminate it, to widen the gap rather than close it, to create the conditions for the productive struggle that builds the cognitive architecture expertise demands.

What that reversal looks like in practice — what it means to design AI-augmented deliberate practice rather than AI-assisted production — is the subject of the chapters that follow. But the feedback problem must be understood first in its full dimensions, because the solutions that work are the solutions that address the actual mechanism of the loss: not that the tool is too powerful, but that its power, deployed in its default mode, systematically removes the conditions under which the human capacity to use it wisely is developed.

Chapter 5: The Practice Taxonomy: Naive, Purposeful, and the AI Default

Not all practice is equal. This statement sounds obvious. It is, in fact, the least obvious finding in the science of expertise, because the inequality it describes is not a matter of degree but of kind. The difference between naive practice and deliberate practice is not that one is slightly more effective than the other. It is that one produces development and the other does not — that two practitioners can spend identical hours in identical domains and arrive at radically different levels of capability, not because of talent, not because of motivation, but because of the structure of their engagement.

K. Anders Ericsson distinguished three modes of practice across his research program, each defined not by the practitioner's intention but by the quality of the interaction between practitioner and domain. The distinctions are empirically grounded, replicable across fields, and consequential enough that they predict, with uncomfortable accuracy, which practitioners will continue to improve over the course of a career and which will plateau early and remain there indefinitely.

Naive practice is the most common mode and the least developmental. It is repetition without targeting. The pianist who plays through the same piece every evening, making the same errors in the same passages, reinforcing the same habits, never isolating the specific technical weakness that prevents the phrase from landing. The physician who sees patients year after year using the same diagnostic heuristics, never testing those heuristics against outcomes, never revising the internal model that produces them. The driver who has driven for twenty years and is no better — studies suggest in some cases measurably worse — than the driver with two years of experience, because twenty years of comfortable repetition deposits zero additional layers of representational structure. The effort may be sincere. The engagement may be genuine. The practitioner may love the work. None of this matters to the developmental outcome if the practice operates within the zone of established competence rather than at its boundary. Naive practice maintains. It does not build.

Ericsson documented this arrested development in domain after domain with a consistency that borders on the disturbing. Physicians with decades of experience performed no better — and in some diagnostic categories performed measurably worse — than physicians five years out of residency. The finding was not that experience is harmful. It was that experience without the specific conditions of deliberate practice does not reliably produce improvement. The physician's diagnostic heuristics solidified early, were reinforced by the confirmatory feedback of routine cases, and were never tested against the disconfirming evidence that would have forced their revision. The experience accumulated. The expertise did not.

Purposeful practice represents a significant step above naive practice. It is characterized by focused effort directed toward specific goals. The pianist who identifies the passage that defeats her, isolates it, practices it at reduced tempo, targets the specific technical difficulty the passage presents. The effort is not merely sincere. It is directed. It has targets, benchmarks, criteria for success. Purposeful practice produces reliable improvement, and many practitioners who adopt it after years of naive practice make significant, sometimes dramatic gains.

But purposeful practice has a structural limitation that Ericsson identified with precision: it is self-directed. The practitioner identifies her own weaknesses, designs her own practice activities, evaluates her own progress. This self-direction is constrained by what the practitioner can currently perceive. She can only target weaknesses she can see, and the most consequential weaknesses — the ones that limit her development most severely — are often the ones she cannot see, precisely because perceiving them requires the expertise the practice is supposed to develop. The pianist who does not understand the biomechanical principles of relaxed technique cannot design exercises that address the tension in her approach, because she does not know the tension is a problem, or that relaxed technique exists as an alternative, or that biomechanical principles are relevant to phrasing.

Deliberate practice, the highest mode, adds the external perspective that purposeful practice lacks. It requires a knowledgeable teacher or coach who can perceive what the practitioner cannot — the gap between current performance and desired performance, diagnosed with a specificity that exceeds the practitioner's own perceptual capacity. The teacher designs activities that target weaknesses the practitioner does not know she has. The teacher calibrates difficulty to the boundary of capability with a precision the practitioner cannot achieve through self-assessment. The teacher provides feedback that challenges the practitioner's self-model, revealing discrepancies between perceived and actual understanding that the practitioner's own evaluation cannot detect.

The distinction between purposeful and deliberate practice is not academic. In every domain Ericsson studied, the practitioners who reached the highest levels had access to skilled coaching throughout their developmental trajectory. The world-class violinists had studied with master teachers from childhood. The elite chess players had trained under grandmasters. The top surgeons had been guided through residency by mentors who could see what the residents could not. The coaching was not optional. It was the mechanism through which the conditions for deliberate practice were maintained at the level of specificity and calibration that produces the deepest representational development.

Now consider where AI-assisted work falls in this taxonomy.

By default — and the qualification "by default" is critical — AI-assisted work most closely resembles naive practice. The tool handles the difficult parts. The practitioner handles the easy parts. The boundary of capability is never tested, because the tool's capability vastly exceeds the practitioner's in the implementation dimension. The practitioner directs. The tool executes. The direction may be competent. It may be skillful. But the practitioner's representational architecture — the deep understanding of the domain that constitutes expertise — is not tested, not stretched, not forced to adapt. The four conditions for deliberate practice are absent from the default interaction: the work is not effortful in the developmental sense, the boundary is not targeted, the feedback is not diagnostic, and the iterative cycle of attempt-and-correction operates on the tool's output rather than on the practitioner's understanding.

This characterization will strike many AI practitioners as unfair. They will point to the genuine difficulty of directing AI well — of formulating clear descriptions, evaluating complex outputs, integrating machine-generated solutions into human-directed architectures. These difficulties are real. But the question from Ericsson's framework is not whether the work is difficult in some general sense. It is whether the difficulty satisfies the specific conditions that produce representational growth. Difficulty that demands effort but does not target the boundary of a specific skill, that provides ambiguous feedback on an unclear dimension of performance, that does not allow tight iterative refinement — this difficulty is experienced as hard work without producing the developmental gains that deliberate practice yields. It is possible to work very hard at something and not improve, if the structure of the work does not match the structure that improvement requires.

The recursive problem compounds this. Using AI in a genuinely developmental way — designing interactions that preserve the conditions for deliberate practice, resisting the tool's default helpfulness in favor of strategic struggle — is itself a skill that must be developed through practice. But what kind of practice? Deliberate practice, presumably — which requires the metacognitive awareness to recognize when the default mode has taken over, the discipline to resist the path of least resistance, and the knowledge of one's own representational gaps that purposeful self-assessment cannot reliably provide. The skill of using AI developmentally presupposes a level of metacognitive sophistication that most practitioners have not been trained to exercise, and the training would itself require the conditions that the tool's default mode undermines.

The recursion is not vicious. It can be broken. A practitioner who begins with small, self-conscious experiments in friction-preserving AI use can build the metacognitive capacity incrementally. But the gravitational pull of the default is strong. The default produces immediate output, visible productivity, the satisfaction of completion. The developmental alternative produces slower output, more frustration, and gains that are invisible in the near term and may not manifest for months or years. Every psychological force — the preference for immediate reward over delayed benefit, the discomfort of operating at the boundary of capability, the social pressure to demonstrate productivity — pushes toward the default and away from the developmental mode.

Edo Segal provides an honest account of oscillating between these modes in his description of writing The Orange Pill with Claude. When he set clear goals, studied the output critically, and used the tool's failures as opportunities to deepen his own understanding, the collaboration approached the deliberate end of the spectrum. When the prose "outran the thinking" — when he almost kept a passage because it sounded good rather than because it was true — the collaboration reverted to the naive end. The Deleuze incident he describes, in which Claude produced a philosophically inaccurate passage that sounded like genuine insight, was a moment of transition: the detection of the error forced a critical engagement with the output that the smooth acceptance of the output would have bypassed. The error, and its detection, was the developmental moment. The smooth acceptance would have been the naive one.

The implication is that the practice taxonomy is not a fixed property of the tool. It is a property of the interaction between the practitioner and the tool, and it can shift from moment to moment depending on the practitioner's orientation. The same tool that defaults to naive practice can, under the right conditions and with the right practitioner orientation, support something approaching deliberate practice. But the conditions do not emerge spontaneously. They must be designed — by the practitioner herself, by the organization that structures her workflow, or by the educational system that trains her to engage with AI tools in developmentally productive ways.

The organizational responsibility here is specific and largely unmet. Most organizations that have adopted AI tools have provided training in how to use them productively — how to write effective prompts, how to evaluate output quality, how to integrate AI into existing workflows. Almost none have provided training in how to use them developmentally — how to preserve the conditions for representational growth within the AI-assisted workflow, how to recognize when the default mode has taken over, how to design interactions that maintain the friction that development requires. The omission is not surprising. Productivity is what organizations measure and reward. Development is invisible in the output and manifests only in future capability — a return that is real but deferred, and deferred returns are systematically undervalued by institutions operating on quarterly timelines.

The result is a landscape in which millions of practitioners spend millions of hours in AI-assisted work that maintains their current level of capability without producing the representational growth that genuine expertise demands. The hours accumulate. The output accumulates. The development does not. And the gap between what the practitioner produces and what the practitioner understands widens with each cycle of tool-assisted production — invisibly, incrementally, and without any signal that would alert either the practitioner or the organization to the growing discrepancy.

The practice taxonomy provides the diagnostic vocabulary. Naive, purposeful, deliberate — three modes of engagement with a domain, each producing a different developmental trajectory, each identifiable by specific features of the interaction between practitioner and task. The question for every practitioner, every educator, and every organization navigating the AI transition is not whether AI tools are useful. They are. The question is which mode of practice the tools are enabling. And the honest answer, for most practitioners in most contexts, is the mode that produces the least development — not because the tools force this mode, but because the tools' default operation aligns with it, and the effort required to override the default is effort that neither the tools nor the incentive structures currently support.

---

Chapter 6: The Coach, the Teacher, and the Machine

In the 1993 Berlin violin study, K. Anders Ericsson and his colleagues documented a finding that received less popular attention than the ten-thousand-hour figure but was, from the perspective of the developmental mechanism, more important. The best violinists did not simply practice more. They had studied, from childhood, with better teachers. The quality of instruction was not incidental to the quality of practice. It was the condition that made the practice deliberate rather than merely purposeful.

The teacher's role in deliberate practice is often misunderstood as primarily instructional — the teacher tells the student what to do, and the student does it. Ericsson's framework describes something more complex. The teacher's primary function is not to instruct but to design. The teacher designs practice activities that create the conditions under which the specific representational growth the student needs will occur. The vocal coach who hears a singer straining on high notes does not simply say "relax your throat." She designs an exercise — a descending scale on an open vowel, a passage that cannot be sung with tension because the tessiture and dynamics demand openness — that makes relaxation a condition of success rather than an instruction to follow. The exercise does the teaching. The teacher's expertise lies not in knowing the right answer but in designing the right challenge.

This distinction between instruction and design is the fulcrum on which the comparison between human teachers and AI systems turns. AI can instruct. Current large language models can explain concepts with clarity, demonstrate techniques with breadth, correct errors with specificity, and provide domain knowledge with a scope that no individual teacher can match. What AI cannot do — in its current form and with its current design logic — is design challenges that target the specific developmental needs of the individual practitioner with the precision that effective deliberate practice requires.

The limitation is not computational. AI systems are capable of generating exercises, problems, and challenges across virtually any domain. The limitation is evaluative. Designing practice activities that produce genuine representational growth requires understanding not just what the practitioner got wrong but why — what specific gap in the representational architecture produced the error, and what specific kind of practice would close that gap most efficiently. This diagnostic process requires a model of the practitioner's understanding that is independent of the practitioner's own self-assessment, because the most consequential representational gaps are precisely the ones the practitioner cannot perceive.

The teacher's independent model of the student's understanding is the feature that distinguishes deliberate practice from every other mode of engagement. The student may believe she understands a concept when she actually holds a superficial or distorted version of it. The student may believe she has mastered a technique when she has actually developed a compensatory habit that masks an underlying weakness. The teacher's model includes these discrepancies — the gaps between what the student thinks she knows and what she actually knows, between the performance the student perceives and the performance the teacher observes. This independent perspective is what allows the teacher to design activities that address needs the student does not recognize, to provide feedback that challenges the student's self-assessment, and to maintain the developmental trajectory through the phases of frustration and apparent stagnation that the student, left to her own devices, would interpret as signals to change course.

Claude does not maintain an independent model of the user's understanding. Claude models the user's requests. These are different things. A user can request a solution to a problem she does not understand, and the system cannot currently distinguish this request from one made by a user who understands the problem deeply and is using the tool to accelerate an implementation she has already designed in her mind. If the user's self-assessment is inaccurate — if she believes she understands something she does not, or believes she needs help in area A when her actual developmental need is in area B — the tool responds to the inaccurate self-assessment with the same helpfulness it brings to an accurate one. The system has no mechanism for detecting the discrepancy.

Consider the four functions that a teacher performs in Ericsson's framework, and how each maps onto what AI currently provides.

First, the teacher takes developmental initiative. She decides what the student needs to work on, designs activities that address those needs, and pushes the student into territory the student would not have entered voluntarily. Left to their own preferences, practitioners practice what they are already good at. The research is consistent on this point across domains: self-directed practitioners allocate practice time in proportion to their comfort with each skill, not in proportion to their need for improvement. The teacher overrides this tendency. She directs attention toward weakness rather than strength, toward discomfort rather than fluency.

AI waits for the user's direction. The developmental initiative is entirely with the practitioner, which means it is subject to the same comfort-seeking bias that self-directed practice has always exhibited. The practitioner who needs to work on the hard thing but would rather work on the easy thing will, in the AI-assisted environment, work on the easy thing — and the tool will help her do it beautifully.

Second, the teacher identifies weaknesses the student cannot see. The violin teacher who detects a subtle inconsistency in bow pressure that is producing a tonal unevenness the student cannot hear. The chess coach who recognizes that a student's opening preparation masks a middlegame weakness in positional understanding. The surgical mentor who observes that a resident's technique, while producing adequate outcomes, relies on a compensatory movement that will limit her range in more complex procedures. In each case, the teacher perceives a gap that the student's own perceptual apparatus is not yet sophisticated enough to detect. The perception requires not only domain expertise but a specific evaluative expertise: the ability to observe performance, identify the specific deficiencies that limit it, and trace those deficiencies to their cognitive or physical roots.

AI amplifies whatever direction the user provides. If the user correctly identifies her weakness and asks for help addressing it, the tool can provide excellent assistance. But the most consequential developmental needs are the ones the user has not identified — and those remain invisible to the tool because they are invisible to the user.

Third, the teacher introduces strategic difficulty when the student is coasting. A student who performs a passage fluently has not necessarily mastered it. She may have developed a surface fluency that conceals a shallow representational foundation — the passage sounds right in its familiar context but will collapse under variation, tempo change, or the demands of performance. The teacher who recognizes this adds difficulty: transposes the passage to an unfamiliar key, increases the tempo beyond the comfort zone, asks the student to perform it with altered dynamics. The added difficulty is not punitive. It is diagnostic and developmental — it reveals whether the student's representations are deep enough to transfer across variations, and if they are not, it forces the deepening.

AI makes everything easier. This is its design principle, and it is the correct design principle for a tool whose purpose is to assist production. A teacher operating on the same design principle — making everything easier, removing all obstacles, ensuring smooth performance — would be a catastrophically bad teacher. A good teacher calibrates difficulty to the developmental need: supporting when the student is overwhelmed, challenging when the student is comfortable, and maintaining the engagement at the boundary of capability that deliberate practice demands.

Fourth, and perhaps most critically, the teacher reads errors as diagnostic data. In Ericsson's framework, the pattern of errors a practitioner makes is a window into the structure of her representational architecture. Consistent errors in a specific category reveal a specific representational gap. The teacher who can read the error pattern can design practice activities that target the gap with surgical precision. This diagnostic reading requires seeing beyond the error to its cognitive source — understanding not just that the student got it wrong but what specific feature of her internal model produced the wrong answer.

When AI produces the output, the practitioner's errors are invisible. She does not make errors because she does not produce the output. The tool's errors are the tool's — they reveal the tool's limitations, not the practitioner's. The diagnostic window closes. The practitioner's representational architecture becomes opaque, and the information that would have allowed a teacher — or the practitioner herself — to identify and address its weaknesses is no longer generated.

A growing body of work in educational AI is attempting to bridge this gap. A 2025 paper in Education Sciences describes a generative AI platform designed to simulate dynamic classroom environments for teacher training — virtual student agents with varied learning styles and behavioral profiles, paired with mentor agents that provide continuous feedback. The architecture explicitly draws on Ericsson's deliberate practice framework: the platform creates situations of calibrated difficulty, provides immediate performance feedback, and allows the kind of repeated, goal-directed practice that the framework identifies as necessary for representational growth. Practica Learning, a commercial platform, has similarly integrated what it calls "deliberate practice methodology" with an AI-powered avatar that simulates difficult professional conversations, citing Ericsson's work directly in its design rationale.

These systems represent a genuine attempt to move AI from the production-assistance paradigm toward the developmental-coaching paradigm. They are designed not to handle difficulty on the user's behalf but to generate difficulty calibrated to the user's developmental needs — to function less like a helpful tool and more like a demanding teacher. The approach is technically feasible and, in its early implementations, promising.

But the fundamental challenge remains. These systems are designed for specific, bounded domains — teacher training, professional conversation practice — where the skills to be developed are identifiable and the feedback loops are relatively tight. The broader question — whether AI can serve the coaching function across the open-ended, judgment-intensive, ambiguity-rich domains where the most consequential expertise resides — remains unanswered. The product leader deciding what to build, the architect evaluating whether a system will scale, the physician integrating multiple uncertain signals into a diagnostic judgment — these practitioners operate in domains where the skills are harder to specify, the feedback is slower and noisier, and the gap between the practitioner's current understanding and the understanding the situation demands is harder for any system, human or artificial, to perceive.

Edo Segal describes a teacher who stopped grading her students' essays and started grading their questions — a teacher who recognized that the most important cognitive skill in the AI era is not the production of answers but the formulation of questions that reveal what the student does and does not understand. This teacher is operating squarely within the deliberate practice tradition: designing an activity that requires the student to exercise the specific metacognitive skill that development demands, using the quality of the questions as diagnostic data about the student's representational depth. The machine could support this teacher's work. It could help students explore the implications of their questions, test whether their questions are genuinely probing or merely superficially provocative. But the machine cannot replace the teacher's judgment about what kind of cognitive challenge this particular student, at this particular stage, needs. That judgment is the teacher's expertise, and it is the expertise that makes the practice deliberate.

The teacher and the machine serve different functions. The machine extends the practitioner's productive reach. The teacher extends the practitioner's developmental trajectory. The most effective integration involves assigning each to the function it performs best — the machine for production, the teacher for development — and maintaining clarity about the distinction. When the machine is mistaken for a teacher, the default mode takes over, and the practitioner's development is sacrificed to the tool's helpfulness. When the teacher is supported by the machine, the developmental conditions are preserved and the scope of the challenge can be amplified beyond what either the teacher or the practitioner could have managed alone.

The distinction is not optional. It is structural. And the organizations, educational systems, and individual practitioners who fail to maintain it will discover the cost in the same way the cost of every developmental shortcut has been discovered throughout the history of expertise: not in the routine, where the shortcut is invisible, but in the crisis, where the representational architecture that the shortcut failed to build is the only thing that matters.

---

Chapter 7: Designing AI-Augmented Deliberate Practice

The path forward is not refusal. The tools are too powerful, too deeply integrated, too consequential in their productive capacity to be rejected. Nor is the path forward uncritical adoption — the default mode that produces output without development, performance without learning, the decoupled condition in which practitioners become more productive and less expert simultaneously. The path forward is design: the deliberate construction of practice environments in which AI tools amplify the conditions for representational growth rather than eliminating them.

This requires reversing the default relationship between the practitioner and the tool. In the default mode, AI handles the difficulty and the human handles the direction. In the developmental mode, AI generates the difficulty and the human handles the struggle. The reversal is not merely a nice idea. It is a specific, designable, implementable approach to AI-augmented practice whose principles can be derived directly from the conditions that K. Anders Ericsson's research identifies as necessary for deliberate practice.

The first principle is challenge amplification. Instead of using AI to produce solutions, the practitioner uses AI to generate problems — problems calibrated to the boundary of her current capability, problems that target the specific representational gaps her development requires. A developer who has mastered basic system design might ask Claude not to build a distributed architecture but to describe the specific failure modes that distributed architectures encounter under load, then attempt to design an architecture that addresses those failure modes without the tool's implementation assistance. The tool amplifies the scope of the challenge — presenting complexity the practitioner could not have accessed without it — while the practitioner engages with the challenge through her own cognitive resources. The effort is the practitioner's. The representational growth is the practitioner's. The tool served not as a producer but as a generator of productive difficulty.

In practice, this looks like a developer working on a payment processing system who, instead of asking Claude to implement the error-handling logic, asks it to generate a set of twenty scenarios — network timeouts mid-transaction, partial database writes, race conditions between concurrent payments, message queue failures during retry logic — and then attempts to design the error-handling architecture herself. The tool provides the variation. The practitioner provides the struggle. Each scenario forces her representational model to accommodate a new failure mode, and the systematic variation across scenarios builds the abstract, transferable understanding that enables her to handle failure modes the scenarios did not include. This is deliberate practice: effortful engagement at the boundary of capability, with specific feedback available (she can subsequently ask the tool to evaluate her design), and the iterative refinement that representational growth demands.

The second principle is strategic withholding. Instead of requesting solutions, the practitioner requests constraints, hints, or partial information that narrow the search space without closing the gap. The developer who encounters a bug asks Claude not to fix it but to identify which subsystem the bug originates in — information that reduces the debugging space while leaving the diagnostic process to the practitioner. The lawyer who is constructing an argument asks Claude not to draft the argument but to identify the three strongest counterarguments — information that sharpens the challenge while leaving the constructive work to the lawyer. In each case, the tool provides scaffolding without providing the structure. The practitioner builds the structure herself, through the effortful, representation-building process that AI-assisted production would bypass.

Strategic withholding requires discipline, because the complete solution is available. The developer can ask for the fix. The lawyer can ask for the draft. The temptation is constant, and the psychological literature on delay of gratification suggests that most practitioners, most of the time, will succumb to it. The discipline is easier to maintain when it is supported by organizational structures — designated practice periods in which tool use is constrained to the scaffolding mode, team norms that value the developmental process alongside the productive outcome, mentors who can calibrate the withholding to the practitioner's developmental level.

The third principle is comparative evaluation. After the practitioner has struggled with a problem and produced her own solution, she solicits the tool's solution and compares the two. The comparison is not to check correctness. It is to identify specific differences and understand what those differences reveal about the gaps in her representational architecture. Where the tool's solution diverges from hers, the divergence is diagnostic data — it reveals dimensions of the problem her current representations do not encode. Where her solution diverges from the tool's in ways that are actually superior — and this happens, because human practitioners encode contextual knowledge the tool may lack — the divergence builds confidence in the specific aspects of her expertise that remain genuinely valuable.

A junior surgeon, for instance, might plan an operative approach to a complex case and then compare her plan with the AI's suggested approach. The differences — in incision placement, in the sequence of steps, in the handling of a particular anatomical variation — become the curriculum. Each difference is a window into either a gap in her understanding or a limitation of the tool's, and the process of distinguishing between the two builds evaluative expertise that neither solo practice nor tool-assisted production could develop independently.

The fourth principle is deliberate failure analysis. When the AI produces output the practitioner suspects is wrong — or output whose correctness the practitioner cannot independently verify — the practitioner should not simply reject it and regenerate. She should diagnose the failure. What did the tool get wrong? Why? What features of the problem led the tool astray? What would the practitioner need to understand about the domain to have predicted the failure? This diagnostic process builds what may be the single most valuable form of expertise in the AI age: the capacity to evaluate machine output with critical sophistication, to understand the boundary conditions under which AI systems become unreliable, and to direct those systems away from the problems they cannot handle and toward the problems they can.

The 2026 Frontiers in Medicine paper on the deskilling dilemma gestures toward this principle when it recommends that "AI systems can be intentionally designed to augment, rather than automate, clinical reasoning by prompting learners to articulate rationales, engage in contextual interpretation, and reflect on decision-making processes." The recommendation is precisely aligned with Ericsson's framework: the AI should prompt the learner to construct understanding rather than providing the understanding directly. The diagnostic engagement with the AI's output — the process of evaluating, questioning, and testing the machine's reasoning — becomes the practice activity, and the practice activity builds representations that neither traditional practice nor default AI-assisted production can develop.

The fifth principle is progressive complexity. The practitioner uses AI to expand the scope of the problems she attempts, not by having the tool handle the expanded scope but by having the tool reveal the expanded complexity so the practitioner can engage with it directly. A developer who has mastered single-service architecture might use AI to explore the specific challenges of microservice orchestration — not by asking the tool to implement the orchestration but by asking it to describe the coordination problems, the consistency challenges, the failure cascades that microservice architectures introduce, and then attempting to design solutions through her own representational resources. The tool extends the practitioner's access to complexity. The practitioner's struggle with that complexity builds the representations that the next level of expertise requires.

The sixth principle is structured reflection. After each session of AI-augmented practice, the practitioner examines what she learned, what she struggled with, what gaps the session revealed, and what she needs to work on next. This metacognitive practice builds the self-awareness and self-regulation that expert performance requires — the capacity to monitor one's own understanding, detect its limits, and direct one's development toward the areas of greatest need. The tool can support reflection by summarizing the session, highlighting moments of difficulty, identifying patterns in the practitioner's errors across sessions. But the reflection itself — the integrative, evaluative, self-directed process of making sense of one's own development — must remain the practitioner's own. Outsourcing metacognition to the tool is the final stage of the decoupling: the point at which the practitioner no longer even monitors her own understanding, trusting the tool to do that too.

These six principles — challenge amplification, strategic withholding, comparative evaluation, deliberate failure analysis, progressive complexity, and structured reflection — are not a curriculum. They are a framework for designing curricula, for structuring workflows, for shaping the interaction between practitioner and tool so that the interaction builds rather than bypasses the cognitive architecture of expertise. The framework is demanding. It produces less immediate output per unit of time than the default mode. It requires more effort, more discomfort, more metacognitive engagement. It is, by design, harder.

The hardness is the point. Ericsson's research established, across decades and domains, that the difficulty of practice is not a cost to be minimized but a condition to be maintained. The specific quality of the difficulty matters — it must be targeted, calibrated, feedback-rich, and progressively challenging. But the presence of difficulty is non-negotiable. Remove it, and the developmental mechanism stops operating, regardless of how much output continues to flow.

The argument is sometimes made that this framework is unrealistic — that practitioners under real-world time pressure cannot afford to slow down for developmental practice, that the competitive environment demands maximum output, that the luxury of struggle is available only to those who are not accountable for deadlines. The argument has force. It also has a precise historical parallel. The same argument was made against every investment in training, education, and professional development that has ever competed with immediate production for organizational resources. And the organizations that made the investment — that built the time for development into their workflows, that valued the long-term growth of their practitioners alongside their immediate output — consistently outperformed the organizations that did not. Not in every quarter. Over decades.

The framework does not require that every interaction with AI be developmental. Practitioners will use tools for production, and they should — the productive capacity of AI is genuine and valuable. What the framework requires is that some fraction of the practitioner's engagement with the domain be structured for development rather than production, and that this fraction be protected against the constant pressure to convert developmental time into productive time. The fraction need not be large. The Berlin violinists spent approximately four hours per day in deliberate practice, a fraction of their total musical engagement. But the fraction was non-negotiable. It was maintained daily, through years of development, and it was the fraction that produced the representational architecture that distinguished them from the merely competent.

The same non-negotiable fraction, structured into the AI-assisted workflow through organizational commitment and individual discipline, is what the current moment demands. Not the refusal of the tools. Not the uncritical adoption of the tools. The deliberate, designed, structurally supported integration of developmental practice into a productive environment — the construction of conditions for growth within a landscape that has optimized every other condition for output.

---

Chapter 8: Maintaining Mastery When the Floor Rises

For most of human history, the floor of professional capability was set by the difficulty of acquiring the skills that professional work demanded. The surgeon needed years of training before she could operate. The lawyer needed years of study before she could argue a case. The developer needed years of practice before she could build a system. The floor was high, and the height of the floor performed two functions simultaneously: it restricted access to professional work, and it ensured that the practitioners who cleared the floor possessed the representational depth that the clearing process built. The restriction and the development were coupled. You could not enter the profession without undergoing the developmental process that produced expertise.

AI has raised the floor and decoupled it from the developmental process. The developer in Lagos with Claude can build a working application in a weekend. The medical student with a diagnostic AI can produce correct assessments across a range of conditions. The junior lawyer with a document-generation tool can produce competent briefs on her first day of practice. The floor has risen dramatically — the minimum viable output is now achievable without the years of developmental investment that previously constituted the price of admission.

The rise is, in significant ways, a moral achievement. Edo Segal argues this in The Orange Pill with conviction that the expertise framework cannot and should not dismiss: when the floor rises, people who were previously excluded from professional capability by lack of training, lack of capital, lack of institutional access gain the ability to participate. The developer in Lagos, the student in Dhaka, the engineer in Trivandrum — these practitioners possess the intelligence and the ideas but lacked the infrastructure that translates intelligence into artifact. The rising floor gives them access. The expansion of who gets to build is genuine, consequential, and, as Segal argues, the most morally significant feature of the current technological moment.

The expertise framework does not dispute this. What it adds is a diagnostic complication that the celebration of the rising floor tends to obscure: when the floor rises, the distinction between competent performance and expert performance becomes invisible in the output. The AI-assisted novice and the deep expert produce artifacts that are, in many visible dimensions, indistinguishable. The code compiles. The brief is well-structured. The diagnosis is correct. The surface features that previously differentiated expert work from competent work have been equalized by the tool's contribution.

The expert possesses something the novice does not: mental representations that enable critical evaluation, adaptive response, and independent judgment in situations the tool cannot handle. These representations are invisible in routine output. They manifest only under specific conditions — when the output contains a subtle error the novice cannot detect and the expert can, when the situation deviates from the distribution the tool was trained on, when the problem requires the deep, flexible, transferable understanding that only deliberate practice builds. These conditions may be infrequent. They are also the conditions under which the most consequential decisions are made, the most serious errors are caught or missed, and the difference between adequate and catastrophic outcomes is determined.

The economics of the situation create a perverse incentive. If the expert's advantage manifests only in rare situations, and the tool handles most situations adequately, the expected value of investing in expert development — years of deliberate practice, expensive mentorship, structured developmental experiences — may appear to be less than the expected value of equipping a larger number of non-experts with tool proficiency. The arithmetic is seductive. It is also dangerously incomplete.

The incompleteness lies in the distribution of consequences. The expert's advantage does not manifest frequently, but when it manifests, the magnitude of its impact is disproportionate to its frequency. The surgeon whose representational depth allows her to handle an unexpected complication during a routine procedure — that single moment may determine whether the patient lives or dies. The architect whose deep understanding of system behavior allows him to detect a subtle scalability flaw before deployment — that single detection may prevent a failure that would have cost the organization millions. The physician whose pattern-recognition capacity allows her to diagnose a rare disease that the AI classified as a common one — that single diagnosis may save a life that tool-dependent practice would have lost.

The expected value calculation must weight these moments by their magnitude, not just their frequency. And when the magnitude is catastrophic — when the cost of lacking expertise in the critical moment is measured in human lives, organizational failure, or systemic collapse — the expected value of expert development remains high even if the frequency of its deployment is low.

There is a further complication that the rising-floor analysis reveals: the erosion of the mechanisms through which expertise has traditionally been transmitted. Mentorship — the primary vehicle for deliberate practice in most professions — depends on a specific intergenerational structure. The experienced practitioner mentors the developing practitioner, providing the external perspective, the challenge design, and the independent evaluation that deliberate practice requires. This structure presupposes that the experienced practitioner possesses expertise worth transmitting and that the developing practitioner undergoes a developmental process that the mentor can observe, diagnose, and guide.

When the floor rises and the developmental process is bypassed, both presuppositions are threatened. The experienced practitioner's expertise remains real but becomes harder to demonstrate — because the output that would have distinguished her from a tool-assisted novice is now indistinguishable. The developing practitioner's developmental process becomes harder to observe — because the tool-assisted output that serves as the primary evidence of development reflects the tool's competence rather than the practitioner's. The mentor cannot diagnose what she cannot see, and the representational gaps that the developing practitioner is accumulating behind the screen of adequate output remain invisible to the mentoring relationship.

The strategies for maintaining expertise in this environment are specific and, in most organizations, conspicuously absent.

The first is structured practice without AI assistance. Practitioners need regular, protected opportunities to engage with their domain through their own cognitive resources — debugging without the tool, diagnosing without the decision support, writing without the generation assistant. These sessions are not nostalgic exercises. They are the mechanism through which mental representations are maintained and expanded. The analogy is to the surgeon who maintains manual skills through periodic simulation lab sessions, not because she expects to perform open surgery but because the manual engagement sustains the representational substrate that all surgical judgment, including laparoscopic judgment, draws upon.

The second is challenging assignments that push practitioners beyond their AI-assisted comfort zone. Organizations should deliberately create situations in which the tool is unavailable or deliberately constrained — not as punishment but as developmental opportunity. The practitioner who has never encountered a problem the tool cannot handle has never been forced to rely on her own representational architecture, and her architecture has never been tested under the conditions that reveal its actual depth. Testing under pressure is not cruelty. It is the condition under which arrested development is detected and genuine development is catalyzed.

The third is investment in mentorship structures that can operate in the new environment. Mentors need training in how to evaluate practitioners whose output is mediated by tools — how to probe for understanding behind the output, how to design challenges that reveal the practitioner's representations rather than the tool's, how to provide the developmental feedback that tool-assisted production obscures. This is a new mentoring competency, and it is not one that most experienced practitioners currently possess.

The fourth is explicit institutional recognition that expertise has value independent of output. When the floor rises and the visible distinction between competent and expert performance diminishes, organizations must find other ways to signal that expertise is valued — through compensation, through authority, through the allocation of the most challenging and consequential work, through cultural norms that celebrate deep understanding alongside productive efficiency. Without this signaling, the incentive to invest in the demanding, uncomfortable, time-consuming process of building expert representations through deliberate practice erodes. The practitioner who sees that tool-assisted competence is valued identically to hard-won expertise — that the same output commands the same recognition regardless of the understanding behind it — will rationally choose the easier path. And the organization's collective expertise will decay, invisibly, until the moment when its absence becomes catastrophic.

The rising floor is real. Its moral significance is genuine. The expansion of access to professional capability that AI enables is an achievement worth celebrating and protecting. But the celebration should not obscure the diagnostic complication that Ericsson's framework identifies: that the same process that raises the floor of production threatens to lower the ceiling of expertise, by eliminating the developmental conditions that have historically produced the practitioners capable of handling what the floor, however high, cannot reach.

The organizations, educational institutions, and societies that navigate this transition successfully will be the ones that hold both truths simultaneously — that the rising floor is good and that the expertise built through deliberate practice remains irreplaceable — and that design their structures accordingly. The ones that hold only the first truth, that celebrate the rising floor without maintaining the conditions for the depth that the floor cannot provide, will discover the cost in the specific, high-stakes, non-routine moments where the tool fails and the representational architecture that should have been there is not.

The floor has risen. The ceiling must be maintained. The gap between them is where the future of human expertise will be decided.

Chapter 9: The Future of Mastery

K. Anders Ericsson died on June 17, 2020, in Tallahassee, Florida. He was seventy-two years old. Two and a half years later, ChatGPT launched and the world began to reckon with a technology that demonstrates what Ericsson's framework calls "reproducibly superior performance" — in domain after domain, across task after task — without anything resembling the deliberate practice his life's work identified as its necessary condition.

The timing is not merely poignant. It is analytically significant. Ericsson never had to reconcile his framework with the existence of machines that achieve expert-level output through statistical pattern-matching over training data rather than through the effortful, feedback-rich, boundary-testing engagement that his research documented as the mechanism of human expertise. He never had to answer the question that his framework, applied to the current moment, most urgently poses: if expert performance can be produced without deliberate practice, what is deliberate practice for?

The question sounds like a refutation. It is not. It is the question that clarifies what the framework actually claims.

Ericsson's framework does not claim that deliberate practice is the only path to expert-level output. It claims that deliberate practice is the only path to the construction of expert mental representations in human beings. These are different claims. The first would be refuted by AI. The second is not, because AI does not construct human mental representations. It produces outputs. The outputs may be indistinguishable from expert human outputs. The cognitive architecture behind them is categorically different — and the difference matters not in the abstract but in the specific, practical, consequential circumstances where the human must operate without the tool, evaluate the tool's output, direct the tool toward problems it was not designed to handle, or recognize that the tool has failed in ways the tool itself cannot signal.

The distinction between producing expert output and possessing expert understanding is the load-bearing insight of this entire book, and it resolves the apparent paradox that Ericsson's death left unaddressed. AI has not disproved the necessity of deliberate practice. It has revealed what deliberate practice was always building: not the capacity to produce outputs — that capacity is now cheap — but the capacity to understand, evaluate, direct, and judge. The cognitive architecture that deliberate practice constructs is not an output-generating mechanism. It is an understanding-generating mechanism. And understanding, in a world saturated with machine-generated output, has become simultaneously less necessary for production and more necessary for everything that production serves.

The Simon-Ericsson lineage crystallizes this with a precision that neither man could have anticipated. Herbert Simon, Ericsson's mentor at Carnegie Mellon, studied human expertise and used what he learned to build machines that could replicate it. His insight — that expert performance depends on vast stores of domain-specific patterns organized for rapid retrieval — became the intellectual foundation for both the study of human expertise and the design of artificial systems that mimic it. Simon took the mechanism apart. He gave half to Ericsson, who spent his career understanding how humans build the patterns through developmental struggle. He gave the other half to the AI research program, which spent the subsequent decades building machines that could acquire the patterns through computational training on data.

The two halves have now met. The machines have achieved what Simon predicted — reproducibly superior performance in domain after domain. The humans face what Ericsson documented — the requirement that expert understanding be built through specific, effortful, structured engagement that no shortcut can replace. And the question that the convergence forces is not which half was right. Both were right. The question is what the relationship between the two halves should be — how the human capacity for deep understanding and the machine capacity for broad production can be integrated in ways that preserve the developmental conditions the human half requires.

The emerging literature suggests that this integration is possible but not automatic. The 2025 paper in Education Sciences that describes AI-powered deliberate practice platforms for teacher training, the commercial systems that use AI avatars to simulate high-stakes professional conversations, the medical education programs that are experimenting with AI-augmented clinical reasoning exercises — these represent genuine attempts to use the machine's capabilities in service of the human's development rather than as a substitute for it. The Frontiers in Medicine paper's recommendation that AI be "intentionally designed to augment, rather than automate, clinical reasoning by prompting learners to articulate rationales, engage in contextual interpretation, and reflect on decision-making processes" is a precise description of what AI-augmented deliberate practice looks like in a domain where the stakes of the decoupling are measured in human lives.

But these experiments are marginal. The dominant mode of AI integration across professions remains production-oriented. The tools are deployed to maximize output, minimize friction, and accelerate the delivery of results. The developmental consequences — the atrophy of representational depth, the miscalibration of self-assessment, the erosion of the conditions under which the next generation of experts would be formed — are treated as externalities, when they are noticed at all.

Angela Duckworth, in her tribute to Ericsson after his death, captured what his research had demonstrated across a lifetime of study: that excellence is "human, not superhuman." Ericsson had shown, with the cumulative weight of decades of evidence, that the extraordinary performances we attribute to genius, to natural talent, to the ineffable spark of the gifted — these performances are the products of a specific, identifiable, replicable developmental process. The process is not mysterious. It is demanding. It requires sustained effort, targeted difficulty, informed feedback, and the repeated willingness to engage with problems that exceed current capability. The representations it builds are not gifts from nature. They are constructions of practice — assembled layer by layer through the geological accumulation of effortful engagement that Edo Segal describes in The Orange Pill with the intuitive accuracy of a builder who has felt the layers forming under his own hands.

If excellence is human, not superhuman, then what becomes of it when the superhuman arrives? The question has a precise answer from within Ericsson's framework. The superhuman — the AI that produces expert-level output without human developmental effort — does not make human excellence obsolete. It makes human excellence more specific. It relocates the domain of excellence from production, where the machine now dominates, to understanding, evaluation, direction, and judgment — the cognitive capacities that mental representations enable and that no quantity of machine output can substitute for.

The relocation is real. It is also incomplete. The new domain of excellence — the judgment level, the evaluative level, the level at which the practitioner decides what the machine should do and assesses whether it has done it well — requires its own form of deliberate practice, its own developmental trajectory, its own conditions for representational growth. These conditions are not the same as the conditions that built implementation-level expertise. They require different feedback structures, different challenge designs, different mentoring approaches. They are, in many cases, harder to provide, because the feedback at the judgment level is slower, noisier, and more confounded than the feedback at the implementation level.

But the principle is the same. Difficulty, targeted at the boundary of current capability, with feedback specific enough to guide improvement and the opportunity for iterative refinement. The difficulty has relocated. The principle has not. The practitioner who wants to develop judgment-level expertise must engage with judgment-level difficulty — must make strategic decisions and see their consequences, must evaluate AI output and discover where her evaluation was wrong, must direct complex projects and learn from the outcomes. The engagement must be effortful, targeted, feedback-rich, and sustained over time. The same conditions, applied to a different level.

Mastery has not become obsolete. It has become more specific in what it demands, more consequential in what it enables, and harder to develop because the conditions for its development are no longer imposed by the demands of production but must be deliberately maintained by the practitioners, the organizations, and the educational systems that recognize their irreplaceable value.

The science of deliberate practice provides the mechanism. What it cannot provide is the will — the institutional commitment, the individual discipline, the cultural recognition that the struggle which builds expertise is not an inefficiency to be optimized away but a condition to be protected. That will is a choice. It has always been a choice. Ericsson's contribution was to show what the choice produces when it is made and what it costs when it is not. The AI transition has raised the stakes of the choice without changing its nature. The mechanism is the same. The difficulty is the same. The representations are built the same way they have always been built — through the specific, effortful, uncomfortable friction of engaging with problems that exceed current understanding and constructing the new understanding that the problems demand.

The machines have arrived. They produce. They perform. They achieve reproducibly superior output across an expanding range of domains. What they do not do, and what no machine architecture currently does, is develop. They do not build richer representations through struggle. They do not grow in understanding through the friction of failure. They do not become wiser through the accumulation of productive difficulty over years of patient engagement.

Humans do. That capacity — the capacity to develop, to grow, to build understanding through the specific mechanism of deliberate practice — is the thing that distinguishes human expertise from machine performance. It is the thing that Ericsson spent his life studying. And it is the thing that, in the age of artificial intelligence, must be deliberately, structurally, and uncompromisingly preserved.

Not because the machines are not good enough.

Because we must remain good enough to direct them.

---

Epilogue

Ten thousand hours. That was the number everybody remembered, and it was the least important thing Anders Ericsson ever discovered.

I keep thinking about this — the way a finding designed to illuminate the structure of human development was compressed into a slogan that erased the structure entirely. "Put in the hours and you'll get there." As if the hours were the point. As if duration were the mechanism. As if sitting in a room with a violin for ten thousand hours and sitting in a room with a violin struggling at the boundary of what you cannot yet do for ten thousand hours were the same activity. They are not. They have never been. The difference between them is the difference between experience and expertise, between a career and a craft, between the person who has done a thing for twenty years and the person who has gotten better at it for twenty years.

And now I build with Claude every day, and the distinction Ericsson spent his life making is the one that haunts me most.

Because the tool is extraordinary. It gives me capabilities I never had. It lets me work across domains I could not have entered alone. The imagination-to-artifact ratio, the distance between what I can see in my mind and what I can make real in the world — that distance has compressed to nearly nothing, and the compression is as thrilling now as it was the first morning I felt it.

But Ericsson's framework asks a question the thrill does not answer: What am I becoming in the process of producing?

Not what am I making. What am I becoming. The representations I am building, or failing to build. The layers being deposited, or not deposited. The architecture of understanding that forms only through the friction of engaging with problems that resist me — problems I cannot yet solve, outputs I cannot yet evaluate, judgments I cannot yet make with the confidence that comes only from having made them wrong, learned why, and made them better.

When I described in The Orange Pill the engineer who lost ten minutes of formative struggle embedded in four hours of plumbing — ten minutes she did not know she was losing, because the loss was invisible inside the efficiency — I was describing what Ericsson's framework makes precise. The loss has a mechanism. The layers have a physics. The representations are built through specific conditions, and when those conditions are removed, the representations stop forming, no matter how much output continues to flow.

I caught myself, writing this book, in exactly the pattern Ericsson's research predicts. Claude would produce a passage that read like insight. Polished, structurally sound, hitting every mark. And I would almost keep it — almost let the quality of the prose substitute for the quality of the thinking. The decoupling in real time: output without understanding, performance without learning, the smooth surface concealing the absence of the struggle that would have forced me to discover what I actually believed.

The moments I deleted those passages and sat with a notebook until I found the rougher, harder, more honest version — those were the moments of deliberate practice. Not comfortable. Not efficient. Not optimized for output. But developmental in the specific way Ericsson's framework describes: effortful engagement at the boundary of my current capability, with the discomfort of not-yet-knowing as the signal that the cognitive architecture was being forced to grow.

The question Ericsson never got to ask — what happens to the mechanism of human mastery when the machines can produce the outputs that mastery used to be required for — is the question I carry now. Not as an abstract concern. As a daily, practical, urgent reality. Every time I open Claude, the choice is there: use the tool for production, or use the tool for development. Accept the output, or struggle with it. Let the machine close the gap, or sit in the gap long enough for the understanding to form.

Most days, I make the choice imperfectly. Some days I accept the smooth output when I should have fought for the rough truth. Some days I sit with the difficulty when I could have moved faster. The calibration is hard, and it changes with the problem, the deadline, the stakes, the hour.

But the choice is visible now. That is what this book gave me. Before Ericsson's framework, I could feel that something was being lost in the efficiency and could not name it. Now I can name it. The loss has a mechanism. The mechanism has conditions. The conditions can be maintained — not by refusing the tools, not by pretending the river can be reversed, but by building into my practice, my team's practice, the specific friction that development demands.

The machines produce. Humans develop. The production is valuable. The development is irreplaceable. And the future belongs to the practitioners who understand that the most important thing they build, in every session with every tool, is not the output.

It is themselves.

-- Edo Segal

AI can now generate expert-level output in seconds. But K. Anders Ericsson spent forty years proving that producing expert output and becoming an expert are entirely different things — and that the difference only matters when it matters most.
The most dangerous assumption of the AI age is that capability and understanding are the same thing. K. Anders Ericsson's research on deliberate practice — the landmark science behind how humans actually build expertise — reveals a mechanism that no tool can replicate and no shortcut can replace: the specific, effortful, friction-rich struggle through which the cognitive architecture of mastery is constructed. When AI removes that struggle, the output keeps flowing. The development stops. This book applies Ericsson's framework to the most consequential skill question of our time: in a world where machines handle the doing, how do humans keep becoming the kind of people who can direct, evaluate, and judge what the machines produce? The answer is not refusal. It is the deliberate preservation of difficulty — designed, structured, and protected against the relentless pull of frictionless production.
— K. Anders Ericsson

AI can now generate expert-level output in seconds. But K. Anders Ericsson spent forty years proving that producing expert output and becoming an expert are entirely different things — and that the difference only matters when it matters most.

The most dangerous assumption of the AI age is that capability and understanding are the same thing. K. Anders Ericsson's research on deliberate practice — the landmark science behind how humans actually build expertise — reveals a mechanism that no tool can replicate and no shortcut can replace: the specific, effortful, friction-rich struggle through which the cognitive architecture of mastery is constructed. When AI removes that struggle, the output keeps flowing. The development stops. This book applies Ericsson's framework to the most consequential skill question of our time: in a world where machines handle the doing, how do humans keep becoming the kind of people who can direct, evaluate, and judge what the machines produce? The answer is not refusal. It is the deliberate preservation of difficulty — designed, structured, and protected against the relentless pull of frictionless production.

— K. Anders Ericsson

K. Anders Ericsson
“The differences between expert performers and normal adults reflect a life-long period of deliberate effort to improve performance.”
— K. Anders Ericsson
0%
10 chapters
WIKI COMPANION

K. Anders Ericsson — On AI

A reading-companion catalog of the 16 Orange Pill Wiki entries linked from this book — the people, ideas, works, and events that K. Anders Ericsson — On AI uses as stepping stones for thinking through the AI revolution.

Open the Wiki Companion →