Performance is what a practitioner can do now, under current conditions, with available tools. Learning is the change in underlying cognitive structures that enables future performance under different conditions, in novel situations, or without the tools currently available. The two are distinct and often inversely related — a finding established across decades of research in the learning sciences and elevated to central importance by the AI transition. AI tools optimize for performance: they produce the best possible output given the user's input. They do not optimize for learning, and in their default mode of operation they systematically remove the conditions under which learning occurs. This produces a population of practitioners whose visible output is high and whose underlying capability is eroding — a mismatch that remains invisible until circumstances reveal it, typically at the worst possible moment.
The distinction was formalized in the learning sciences by Nicholas Soderstrom and Robert Bjork in 2015 but had been operative in the desirable difficulties tradition for decades before that. The empirical pattern is remarkably consistent: conditions that produce high immediate performance often produce poor retention, poor transfer to novel situations, and poor resistance to changes in task conditions. Conditions that produce struggling immediate performance often produce durable learning that manifests weeks or months later under changed circumstances.
The AI application of the distinction is uncomfortable. Every AI-assisted production that looks like expertise from the outside may be either an expression of genuine expertise (performance resting on durable learning) or a tool-dependent output that will collapse when the tool is removed or the situation changes (performance without learning). From the output alone, the two cannot be distinguished. Only testing under changed conditions — removing the tool, presenting a novel problem, requiring independent justification — reveals whether learning has occurred or only performance has been enabled.
The 2025 Carnegie Mellon study on knowledge workers using generative AI documented this decoupling directly. Practitioners reported that AI made tasks feel cognitively easier. Researchers observed that they were ceding problem-solving expertise to the system while focusing on integration and gathering tasks. Both observations were correct. They were measuring different things — performance (made easier, maintained at acceptable levels) and learning (foreclosed, because the conditions for it were removed). The workers experienced empowerment. The researchers observed deskilling.
The organizational consequence is that standard proxies for practitioner capability — productivity metrics, output quality, deadline performance — have decoupled from the underlying capability they were historically reasonable indicators of. A practitioner's output quality is now a measure of the tool's competence as much as the practitioner's, and organizations that conflate the two systematically overestimate the capability of their AI-reliant workers. This miscalibration remains benign until the situations that require independent capability arrive, and in those situations the consequences of the miscalibration can be severe.
The performance-learning distinction runs through the entire history of educational psychology but was articulated with particular clarity by the Bjork laboratory at UCLA beginning in the 1970s. The 2015 Soderstrom and Bjork review formalized the distinction for the learning sciences and introduced the specific claim that laboratory and classroom measures of learning have routinely confused the two constructs.
The AI application of the distinction is largely a product of 2023-2025, with researchers including Kartik Hosanagar at Wharton, the Carnegie Mellon team studying knowledge workers, and others demonstrating empirically that the theoretical decoupling predicted by learning science is occurring in AI-mediated professional work.
Current output is a poor proxy for capability. AI-assisted production reflects the joint capability of tool and user, not the user alone, and organizations that treat it as a measure of user capability systematically miscalibrate their assessments.
The decoupling is invisible in the short term. Performance-only practice produces no visible signs of erosion; the deficit appears only when conditions change.
Mastery survives changed conditions. The hallmark of genuine learning is that the capability persists when the tools, contexts, or situations change — a test most AI-reliant work never faces until it faces it catastrophically.
Assessment must be redesigned. Evaluation of practitioner capability now requires explicit testing of performance under conditions that the tool cannot mediate — a methodological change most organizations have not yet made.
The self-assessment feedback loop is broken. Practitioners who produce excellent AI-assisted output naturally infer that they possess the capability the output implies, and the inference is increasingly wrong.
A minority position holds that the performance-learning distinction may itself be obsolete in a world where AI is reliably available — that capability should be redefined as what a practitioner can do in her realistic operating environment, which now includes AI. Critics of this position, including Ericsson-framework scholars, argue that the realistic operating environment also includes the novel situations, tool failures, and edge cases where tool-independent capability is what matters.