The generate-first protocol is the simplest and most direct application of desirable-difficulties research to AI-augmented work. The procedure is straightforward: before consulting an AI tool, the user must generate their own response from their own cognitive resources. The quality of the generated attempt is not the criterion—it may be partial, rough, or wrong. The cognitive operation is the criterion: did the user activate their knowledge network, attempt retrieval from memory, engage in the reconstructive effort through which encoding occurs? If so, the subsequent AI-provided answer lands on prepared ground, processed more deeply than it would have been if received without the prior generation attempt. The protocol preserves the generation effect—the finding that produced information is remembered better than received information—in an environment designed to eliminate production through instant provision. Implementation requires overriding the natural preference for immediate answers and the institutional pressure for efficient output, making it psychologically difficult and organizationally costly despite being technically trivial.
The protocol's theoretical foundation is the generation effect and the testing effect, both of which demonstrate that cognitive effort during encoding—specifically, the effort of producing rather than receiving—builds storage strength that passive reception does not. When the developer spends fifteen minutes attempting to debug before asking Claude, those fifteen minutes are a learning event even if the debugging attempt fails. The search through her knowledge network for relevant patterns, the activation of related bug types, the construction of a hypothesis from fragmentary understanding—all of this cognitive work deposits a layer of encoding. When Claude's solution subsequently arrives, the developer processes it not as a standalone piece of information but as a confirmation, correction, or elaboration of the hypothesis she has already formed. The processing is deeper, the encoding richer, the retention more durable than if Claude had answered immediately.
The cost is time—the fifteen minutes that could have been spent on the next task. In productivity cultures measured in output per hour, fifteen minutes of struggle that produce an inferior preliminary result represent pure overhead. The quarterly review does not reward the developer for her private generative effort. No performance metric captures the storage strength that the generation built. The incentive structure of every organization punishes the protocol, because the protocol reduces performance by exactly the amount of time spent generating independently. This is the structural tension: the intervention that produces the best long-term learning outcomes produces the worst short-term productivity metrics. Resolution requires either individual discipline (choosing difficulty despite the cost) or institutional commitment (valuing learning alongside output).
Implementation can take several forms. Mandatory delay: the AI tool introduces a waiting period (thirty seconds to five minutes) between query and response, during which the user is prompted to write their own attempt. Minimum-length requirement: the tool requires a user-generated response of specified length before providing its own answer. Prompted generation: the interface asks 'What is your initial approach?' and refuses to proceed until the user articulates one. Each implementation makes the tool feel less responsive and more demanding. Each preserves the cognitive operation through which learning occurs. The choice between responsiveness and learning is the choice between maximizing performance and building capability—a choice the market makes one way and the evidence prescribes the other.
The protocol is not merely for novices. Experts benefit from generation as well, because effortful retrieval maintains the storage strength and network density that expertise requires. The senior engineer who has used AI for months without generating her own attempts first may find her diagnostic intuition dulling—not because she has forgotten individual facts, but because the associative pathways connecting those facts have weakened through disuse. The pathways are maintained through traversal, and traversal occurs during effortful retrieval. When AI substitutes instant provision for effortful search, the traversals stop, and the network connectivity that expertise depends on begins to degrade. The generate-first protocol maintains the network by ensuring the traversals continue, even when the destination can be reached instantly through the tool.
The protocol is implied by the generation effect (Slamecka and Graf, 1978) and retrieval practice research (Roediger and Karpicke, 2006) but was formalized as an AI-specific intervention only in the 2020s as large language models made instant assistance ubiquitous. The prescription is direct: if the evidence shows that generating before receiving produces better learning, and if AI tools are eliminating generation by receiving first, then the intervention is to restore generation by requiring it before reception. The simplicity of the logic contrasts sharply with the difficulty of the adoption.
Generate before receiving. Produce your own attempt from your own cognitive resources before consulting AI, regardless of the attempt's quality—the cognitive work of generation is the learning event, not the correctness of the output.
Failed generation still beneficial. Even wrong attempts produce network activation and encoding advantages over immediate correct reception, because the search and reconstruction process is what builds storage strength.
Time cost versus capability benefit. The protocol reduces short-term output (time spent generating could have been spent on next task) while building long-term capability (storage strength that persists when tool is unavailable)—a trade-off that organizations must choose deliberately.
Implementation makes tools feel worse. Requiring delays, minimum-length attempts, or prompted generation before AI assistance reduces perceived responsiveness, creating commercial disadvantage against tools providing instant answers.