Sophie Leroy

On AI

A Simulation of Thought by Opus 4.6 · Part of the Orange Pill Cycle

A Note to the Reader: This text was not written or endorsed by Sophie Leroy. It is an attempt by Opus 4.6 to simulate Sophie Leroy's pattern of thought in order to reflect on the transformation that AI represents for human creativity, work, and meaning.

Foreword

By Edo Segal

The switch I never noticed was the one that cost the most.

Not the big transitions — leaving one company to start another, pivoting a product, rewriting a roadmap. Those I felt. Those I prepared for. The switch I'm talking about is the small one. The one that happens forty times a day. The one where I'm deep inside a problem with Claude, the architecture is crystallizing, the pieces are connecting, and then a notification pulls me to another thread. Thirty seconds. Maybe a minute. I handle it. I come back.

Except I don't come back. Not fully. Something has smeared.

Sophie Leroy gave that smear a name: attention residue. The cognitive trace of the thing you just left, clinging to your working memory, degrading what you do next. Not because you're tired. Not because the new task is hard. Because your mind does not toggle like a machine. It carries forward. And what it carries contaminates.

I spent months writing *The Orange Pill* arguing that AI is an amplifier — that it amplifies whatever signal you feed it. Leroy forced me to ask a harder question: What signal am I actually feeding it? Not my best thinking. Not my deepest judgment. The thinking I have available after the fifteenth context switch of the morning, with the residue of fourteen prior evaluations sitting in the workspace where my sharpest judgment is supposed to live.

The amplifier doesn't filter. It doesn't compensate. It takes what you give it right now — this cognitive state, this depleted working memory, this half-reconstructed context — and it scales that.

This book matters because every other conversation about AI focuses on what the tools can do. Leroy focuses on what happens to the mind that directs them. And what happens is measurable, replicable, and invisible from the inside. You feel productive. The data says your judgment has degraded. That gap between feeling and reality is where quality dies.

I built things during those residue-laden mornings. Real things. Products that shipped. But I can no longer tell you with confidence that the judgment I brought to my fifth evaluation was the judgment those decisions deserved. Leroy's research says it wasn't. And the difference — the gap between what I approved and what I would have approved with a cleared mind — is compounding somewhere in every system I've touched.

The river of intelligence keeps flowing. The tools keep accelerating. But the mind that steers them has constraints that no model update will remove. Leroy mapped those constraints. This book is about what they mean for everything we're building.

-- Edo Segal ^ Opus 4.6

About Sophie Leroy

Sophie Leroy is a French-American organizational psychologist and professor of management, currently serving as dean of the School of Business at the University of Washington Bothell. She earned her PhD from the Stern School of Business at New York University. Leroy is best known for her foundational research on attention residue, the phenomenon she identified and named in her landmark 2009 paper "Why Is It So Hard to Do My Work?" — which demonstrated that switching between tasks leaves a measurable cognitive trace that degrades performance on subsequent work, even when the person is unaware of the impairment. Her research program has extended into the effects of interruptions, regulatory focus, and the conditions under which task transitions are most and least costly to cognitive performance. Her work, amplified by Cal Newport and others, has become one of the most widely cited frameworks in discussions of deep work, knowledge-worker productivity, and workplace design. In 2025, Leroy launched an AI-focused speaker series at UW Bothell, turning her attention to how artificial intelligence is reshaping the cognitive demands of modern work.

Chapter 1: The Discovery of Attention Residue

In 2009, Sophie Leroy published a paper with a title so plainly stated it could have been mistaken for a complaint overheard in any office hallway: "Why is it so hard to do my work?" The question was not rhetorical. It was experimental. And the answer she found — that switching between tasks leaves a measurable cognitive residue that degrades performance on whatever comes next — would become one of the most consequential findings in organizational psychology, though neither Leroy nor anyone else could have predicted, at the time, just how consequential.

The experiments were straightforward in design and devastating in implication. Participants were asked to work on a task — engaging, demanding, requiring sustained cognitive effort — and then, before they had finished, to switch to a different task. Their performance on the second task was measured against a control group that had not been interrupted. The degradation was consistent, significant, and specific. It was not caused by the difficulty of the second task. It was not caused by fatigue, or boredom, or motivational decline. It was caused by the first task's refusal to leave.

Leroy called this phenomenon attention residue: the persistence of cognitive engagement with a prior task after the switch to a new one. The term is precise in a way that matters. Residue is not a metaphor for distraction. It is a description of what happens inside working memory when a task is abandoned before completion. The unresolved elements of the first task — its open decisions, its unfinished calculations, its emotional valence — continue to occupy cognitive resources that the second task requires. The mind, in other words, does not toggle. It smears. And the smear degrades everything it touches.

The finding was robust. It replicated across experimental conditions, across task types, across populations. It held whether the first task was interesting or dull, whether the switch was voluntary or imposed, whether the participant was aware of the residue or not. In fact, the last condition proved to be the most important: attention residue operates below the threshold of subjective awareness. The person carrying it does not feel impaired. She feels busy. She feels productive. She feels like she is managing multiple demands competently. The degradation is invisible from the inside, which is precisely what makes it structurally dangerous.

Leroy herself described the core insight with characteristic directness: "We assume the brain will focus wherever we want it to focus. But the brain doesn't function that way." The statement sounds like common sense. It is not. It is a refutation of the operating assumption that governs virtually every modern workplace: that attention is a resource that can be redirected instantaneously and completely from one object to another, like a spotlight swinging across a stage. Leroy's data demonstrates that attention is nothing like a spotlight. It is more like a viscous liquid that clings to the container it last occupied, leaving traces that contaminate whatever vessel receives it next.

The contamination is not trivial. In Leroy's experiments, participants carrying attention residue showed measurably worse performance on the kinds of cognitive operations that matter most in knowledge work: evaluating complex information, making judgments under uncertainty, integrating multiple sources of evidence into a coherent assessment. These are not peripheral skills. They are the core of what knowledge workers do — and what AI-augmented knowledge workers are increasingly asked to do at scale, across multiple projects, in rapid succession. The residue does not impair the mechanical execution of routine tasks. It impairs judgment. It impairs precisely the cognitive faculty that the AI-augmented future places the highest premium on.

This distinction between mechanical execution and judgment is worth pausing on, because it illuminates why Leroy's 2009 finding has become unexpectedly central to the most important question of 2026.

For the first fifty years of the computing era, the primary cognitive demand on knowledge workers was execution: writing code, drafting documents, building spreadsheets, performing analyses. The quality of the output depended heavily on the worker's technical skill — her ability to translate intention into artifact through the specific grammar of her tools. Execution was the bottleneck, and organizations designed themselves around it. Teams were structured to maximize execution throughput. Metrics measured execution output. Careers rewarded execution speed and accuracy.

When AI tools began to handle execution — when Claude Code could write the function, when a language model could draft the brief, when an AI assistant could build the financial model — the bottleneck migrated. It moved upstream, from execution to judgment: the capacity to decide what should be built, to evaluate whether what was built serves its purpose, to choose among competing possibilities on the basis of criteria that cannot be fully specified in advance. The Orange Pill describes this migration as "ascending friction" — the principle that technological abstraction removes difficulty at one level and relocates it to a higher cognitive floor. The difficulty does not vanish. It climbs.

Leroy's research reveals the uncomfortable corollary: the mind that must meet the ascending friction is not necessarily operating at full capacity. The builder who has been switching between five AI-directed projects carries the residue of four into her judgment about the fifth. The friction ascended. The cognitive resources available to meet it may not have.

The organizational psychology literature before Leroy had studied multitasking extensively, and the consensus was already unfavorable: people who multitask perform worse than people who don't. But that finding was general, almost tautological in its lack of specificity. Everyone knew multitasking was suboptimal. Nobody had identified the specific mechanism that made it so. Was it fatigue? Divided attention? Motivational dilution? The general finding was useful for confirming intuitions but useless for designing interventions, because you cannot intervene on a mechanism you have not identified.

Leroy identified the mechanism. Attention residue is not fatigue — the participants in her experiments were not tired. It is not divided attention — they were not attempting to do two things simultaneously. It is not motivational dilution — they were fully motivated to perform well on the second task. It is the specific, measurable, replicable persistence of the first task's cognitive demands in the working memory system that the second task requires. The mechanism is precise enough to generate predictions, and the predictions are specific enough to test.

One prediction in particular has become urgently relevant: the more engaging the first task, the greater the residue it produces upon interruption. This is the finding that separates Leroy's work from the general multitasking literature and makes it uniquely important for understanding the cognitive landscape of AI-augmented work. Engaging tasks capture more cognitive resources — more working memory slots, more emotional investment, more executive control. When an engaging task is interrupted, all of those captured resources resist release. The mind is holding more, so the mind has more to carry into the next task, and the residue is correspondingly deeper.

AI-augmented creative work is, by design, highly engaging. The tight feedback loops — describe what you want, see it realized in seconds, refine and iterate — produce exactly the kind of deep cognitive and emotional engagement that Leroy's research identifies as the most potent generator of residue. The builder working with Claude on a product feature she cares about is not casually attending to a routine task. She is in the grip of something closer to what Csikszentmihalyi called flow: fully absorbed, deeply invested, operating at the outer edge of her capability. And when she is pulled away from that engagement to monitor another project, evaluate another output, switch to another context, the residue she carries is maximal.

This is not a theoretical extrapolation. It is what Leroy's experimental paradigm directly predicts. The most productive, most engaged, most creative AI-augmented workers are the ones who will pay the highest cognitive tax when the organizational structure demands that they switch. The best work generates the worst residue.

In the years following the initial publication, Leroy and her collaborators extended the findings in several directions. Work with Aaron Schmidt examined how regulatory focus — the difference between a promotion-oriented mindset and a prevention-oriented mindset — modulates the residue effect. Work with Nora Madjar examined interruptions specifically, showing that even brief interruptions generate residue that persists long after the interruption itself has ended. Each extension confirmed the robustness of the core phenomenon and narrowed the conditions under which it might be mitigated.

The mitigations turned out to be narrow indeed. Residue is reduced when the first task is brought to a satisfying completion before the switch. It is reduced when the person has a clear plan for returning to the first task. It is not reduced by willpower, by practice, by motivation, or by the belief that one is good at multitasking. As one review of the research noted: "Current research does not support the idea that people can be trained to eliminate attention residue through practice or cognitive training. The effect appears to be a fundamental feature of how working memory and executive attention operate, not a skill deficit."

A fundamental feature. Not a skill deficit. The distinction is critical. A skill deficit can be trained away. A fundamental feature of cognitive architecture cannot. It can only be designed around — accommodated, respected, built into the structure of work rather than overridden by the demands of the work environment.

The implications for the AI-augmented workplace are immediate and specific. A work environment that treats attention as a spotlight — infinitely redirectable, instantly available, costlessly deployable across multiple simultaneous demands — is a work environment designed around a falsehood. Every architecture built on that falsehood will produce systematic cognitive degradation in the people who inhabit it: degradation that the people themselves cannot detect, that the productivity metrics cannot capture, and that the organizational leadership may not even know to look for.

Ye and Ranganathan's finding from their Berkeley study — that AI does not reduce work but intensifies it — documented the organizational symptom. Workers took on more tasks, expanded into adjacent domains, filled every gap with AI-assisted production. The symptom was intensification. The mechanism, unidentified in their study because it fell outside their methodological frame, is attention residue: the cognitive tax levied at every transition between tasks, accumulating across a workday, across a workweek, across the months of sustained AI-augmented production that the most ambitious builders describe.

Leroy did not set out to describe the AI age. She studied meetings and email. She measured what happens when a person is pulled from one task to another in a conventional office setting. But the phenomenon she identified — a fundamental constraint on how the human mind manages transitions between cognitive demands — turns out to be the load-bearing wall of the AI-augmented future. Remove it from the analysis, and the edifice of AI productivity looks magnificent. Include it, and cracks appear in every floor.

The year 2025 saw Leroy step into this intersection directly. As dean of the School of Business at the University of Washington Bothell, she launched a speaker series titled "AI and the Future of Business," stating: "As a business school, we are committed to not only understanding these AI-driven shifts, but also leading the conversation about what they mean for business, innovation and impact." The scholar whose narrow empirical work on task-switching had become one of the most cited frameworks for understanding why the modern workplace is cognitively unsustainable was now turning to face the technology that threatened to make it vastly more so.

The question her research forces is not whether AI tools are valuable. They are. The question is whether the human mind that directs them is operating at the capacity the tools demand — or whether, by the time the builder sits down to make the judgment call that no machine can make for her, she is carrying so much residue from the day's accumulated switches that the judgment itself is compromised.

That question, invisible to the builder and unmeasured by the organization, is the subject of every chapter that follows.

---

Chapter 2: Why the Mind Does Not Switch Cleanly

The popular model of task-switching treats the mind as a computer with one monitor and multiple open applications. Minimize one window, maximize another. The previous application freezes in the background, consuming no resources, waiting patiently for its turn. Click back to it later and find it exactly where you left it, unchanged, requiring no effort to resume.

The model is intuitive, widely held, and almost entirely wrong.

Human cognition does not minimize applications. It does not freeze background processes. It does not maintain paused states in an inert queue. What it does, and what Leroy's research documents with uncomfortable precision, is carry the previous task into the present one — not as a complete, retrievable state but as fragments: unresolved decisions that continue to demand processing, emotional investments that resist disinvestment, contextual associations that compete with the new task's associations for the limited bandwidth of working memory.

To understand why the mind does not switch cleanly, it is necessary to understand, at least at the level of functional architecture, what a "task" looks like inside the cognitive system. A task is not a single mental operation. It is a constellation of activated representations in working memory, organized by executive control processes, sustained by attentional resources, and colored by emotional investment. When a knowledge worker is engaged with a problem — designing a product feature, evaluating a strategic decision, debugging a system — her working memory is populated with that problem's specific context: the variables she is tracking, the constraints she is respecting, the options she is weighing, the criteria she is applying. Her executive control system is configured around that problem's demands: which information to attend to, which to suppress, which operations to perform in which order. Her emotional system is invested in that problem's outcome: the satisfaction of a good solution, the frustration of an obstacle, the anxiety of a deadline.

This constellation is not assembled instantaneously. It builds over minutes, sometimes over the first half-hour of engagement, as working memory populates, executive processes configure, and emotional investment deepens. The assembly process is itself cognitively expensive — it requires attentional resources, consumes working memory capacity, and produces the subjective experience of "getting into" a task, the gradual narrowing of attention from the broad awareness of a person between tasks to the focused engagement of a person inside one.

Switching tasks requires disassembling this constellation and assembling a new one. The disassembly is the problem. The cognitive system does not have a "clear all" function. Working memory representations do not deactivate on command. They decay, and the rate of decay is governed by factors the person does not control: the strength of the representation (stronger representations, built through deeper engagement, decay more slowly), its emotional valence (emotionally charged representations persist longer), and the presence or absence of closure (completed tasks decay faster than incomplete ones, a phenomenon related to what Bluma Zeigarnik documented in 1927).

Leroy's contribution was to demonstrate that this decay process is consequential — that the representations that persist from the previous task actively interfere with performance on the current one. The interference is not general cognitive noise. It is specific: the residue from Task A competes with the demands of Task B for the same limited working memory resources, producing measurable decrements in the speed, accuracy, and quality of Task B performance.

Alan Baddeley's influential model of working memory — the framework that has dominated cognitive psychology since the 1970s — helps explain why the competition is so damaging. Baddeley's model posits a central executive that coordinates information from multiple subsystems: a phonological loop that processes verbal and acoustic information, a visuospatial sketchpad that handles visual and spatial information, and an episodic buffer that integrates information across subsystems and links it to long-term memory. The central executive has limited capacity. It can manage only so many active representations at once. When residue from Task A occupies slots in the central executive that Task B needs, Task B's processing is degraded — not because Task B is too hard, but because the cognitive workspace is already partially occupied.

Stephen Monsell's research on task-set reconfiguration provides the complementary perspective from the executive control side. Monsell demonstrated that switching between task-sets — the configurations of cognitive control that organize processing for a specific task — incurs a measurable time cost even when the tasks are simple and well-practiced. The cost reflects the time required to deactivate one task-set and activate another: to reconfigure which stimulus features are attended to, which response mappings are active, which cognitive operations are primed. This reconfiguration is not instantaneous, and it is rarely complete on the first trial after a switch. Performance continues to improve over the initial trials of the new task as the new task-set gradually displaces the old one.

Leroy's attention residue extends Monsell's task-set reconfiguration in a crucial direction: even after the task-set has reconfigured — even after the person is nominally "in" the new task and performing it with apparent competence — fragments of the old task-set persist. The reconfiguration is superficial. Below the surface, the old task's concerns continue to occupy working memory, draining resources from the new task's demands. The person appears to have switched. The cognitive system has not fully followed.

Erik Altmann and J. Gregory Trafton's memory-for-goals framework adds yet another layer. Their model proposes that goals associated with a suspended task do not simply disappear from memory. They remain activated, competing with the current task's goals for retrieval and processing. The activation level of suspended goals decays over time, but the decay is gradual, and during the period of elevated activation, the suspended goals intrude on current processing — producing the experience of "thinking about the other thing" while trying to focus on this one.

What emerges from the intersection of these research programs — Leroy's attention residue, Baddeley's working memory constraints, Monsell's task-set reconfiguration costs, Altmann and Trafton's goal persistence — is a picture of the human cognitive system that is fundamentally incompatible with the demands that AI-augmented work environments are beginning to place on it.

The AI-augmented builder who directs multiple projects is asked to assemble a cognitive constellation for each project — populate working memory with that project's context, configure executive control for that project's demands, invest emotionally in that project's outcome — and then, when the AI agent on another project requires attention, to disassemble that constellation and assemble a new one. Then to do it again. And again. Across five, eight, twelve transitions in a day.

Each transition incurs every cost the cognitive science literature describes. Working memory must be repopulated. Executive control must be reconfigured. Emotional investment must be redirected. Goals from the previous project persist and compete with the current project's goals. And the residue — the fragments of each previous project that resist decay — accumulates, layer upon layer, across the day.

The accumulation matters because the costs are not independent. They compound. The residue from Project A degrades performance on Project B. When the builder switches to Project C, she carries residue from both A and B. By Project D, she is operating with the accumulated residue of three prior projects, each contributing its unresolved decisions, its persistent goals, its emotional trace to the cognitive load she carries into her evaluation of Project D's AI-generated output.

The compounding is what makes the AI-augmented workplace qualitatively different from the pre-AI workplace in terms of cognitive demand. The pre-AI knowledge worker switched between tasks, too. She moved from meeting to meeting, from email to project to email. But the pace of switching was limited by the pace of production. Writing code took time. Drafting documents took time. Building analyses took time. The production bottleneck created natural periods of sustained engagement between switches — periods during which residue could decay, working memory could clear, and the cognitive system could recover.

AI tools eliminated the production bottleneck. Code that once took hours now takes minutes. Documents that required days require an afternoon. The time between switches compressed, and with it the recovery windows that the cognitive system depends on. The builder is not switching less often; she is switching more, because the AI's speed means there is always another output to evaluate, another project demanding attention, another agent delivering results that require judgment.

What Leroy's framework reveals is that this acceleration does not merely speed up the builder's day. It degrades the quality of the cognitive operations she performs at each stop. The judgment she brings to Project D is not the same judgment she would bring to Project D if it were her first and only project of the day. It is judgment impaired by the accumulated residue of every project that preceded it — and impaired in ways she cannot detect, because the impairment operates below the threshold of subjective awareness.

The invisibility of the impairment is arguably the most dangerous feature of the phenomenon. Leroy's experimental participants did not report feeling impaired. They reported feeling busy, productive, competent. The degradation was visible only in the performance data — in the measurable decline in accuracy, judgment quality, and response to complex information. From the inside, carrying residue feels like working. From the outside, measured with precision, it looks like working worse.

This disjunction between subjective experience and objective performance has a specific implication for organizational design: you cannot rely on knowledge workers to self-report the cost of context-switching. They will not report it, because they cannot feel it. They will report busyness, which the organizational culture rewards. They will report high output, which the productivity metrics capture. They will not report — because they cannot know — that the judgment they brought to their third project was meaningfully worse than the judgment they would have brought had it been their first.

The organizational response cannot be to ask people to switch more carefully or to pay more attention when they switch. Attention residue is not a skill deficit that yields to training. It is a feature of cognitive architecture — as fundamental as the capacity limits of working memory, as non-negotiable as the time required for executive control to reconfigure. The response must be structural: designing work environments, workflows, and AI-augmented processes that respect the architecture rather than overriding it.

What this structural response looks like — the specific design principles that Leroy's research supports — will occupy the later chapters of this book. But the foundation is laid here, in the understanding that the mind does not switch cleanly. It carries its past into its present, involuntarily and invisibly, and the cost of this carrying is paid in the currency that the AI-augmented future values most: the quality of human judgment.

The constraint is biological. It is non-negotiable. And no amount of processing power on the other side of the conversation can compensate for the degradation it produces on this side.

---

Chapter 3: The Monitoring Tax in AI-Augmented Work

There is a specific kind of cognitive labor that has no name in most organizations, appears on no job description, and is measured by no productivity metric. It is the labor of watching.

Not watching passively — the way a security guard watches a bank of monitors, waiting for something to happen. Watching actively: evaluating, judging, deciding, correcting. The labor of maintaining cognitive engagement with a process you did not execute, whose internal logic you must reconstruct from its outputs, and whose quality you must assess against criteria that live in your head rather than in any specification document. This is the labor that the AI-augmented builder performs when she monitors the outputs of an AI agent, and Leroy's framework suggests it is among the most cognitively expensive forms of work the modern economy has produced.

The term proposed here — the monitoring tax — names a cognitive cost that is distinct from traditional multitasking and potentially more severe. Traditional multitasking involves alternating between tasks that the worker herself is performing. She writes code for twenty minutes, switches to answering email for ten, returns to code. The switch is between two activities she controls, on a schedule she largely determines, with natural breakpoints she can anticipate. The attention residue generated by these switches is significant, as Leroy's research demonstrates. But the worker retains a degree of agency over the timing and circumstances of the switch that partially mitigates its cost. She can choose to finish a function before checking email. She can complete a thought before attending to a notification.

AI-augmented monitoring removes that agency. The builder who directs multiple AI agents does not control when each agent's output arrives. She does not determine the pace of the switching. The agents produce on their schedule — which is to say, on the schedule of computational processes that operate at speeds incommensurate with human cognitive rhythms. An agent completing a task sends its output when the task is done, not when the builder is ready to receive it. The builder's cognitive state at the moment of delivery — whether she is deep in another problem, mid-evaluation on a different project, or in the fragile early stages of assembling a new cognitive constellation — is invisible to the system and irrelevant to its operation.

The result is a pattern of switching that is externally paced, unpredictable in its timing, and compulsory in its demands. Each switch requires the builder to perform a specific sequence of cognitive operations that Leroy's framework predicts will be costly.

First, context reconstruction. The builder must reload the monitored project's context into working memory: its goals, its current state, its constraints, the specific criteria against which the output must be evaluated. This context was displaced by whatever she was working on when the output arrived. Reconstructing it is not instantaneous. It requires retrieval from long-term memory, re-activation of the project's task-set, and the re-establishment of the evaluative criteria that will guide her judgment. Each of these operations consumes working memory capacity and executive control resources.

Second, output evaluation. The builder must assess the AI's output against the reconstructed context. This is not a trivial comparison. AI-generated code, text, design, or analysis arrives in a form that looks competent — the surface quality is typically high, the structure is typically sound, the obvious errors are typically absent. The evaluation that matters is whether the output serves the project's deeper requirements: whether the architectural decisions are sound, whether the approach scales, whether the design communicates the intended meaning, whether the analysis captures the nuances that the specification could not fully articulate. This evaluation demands precisely the kind of complex judgment that attention residue degrades most.

Third, decision-making. Based on her evaluation, the builder must decide: accept the output, modify it, redirect the agent, or start over. Each decision has consequences that ripple through the project. Each decision must be made while carrying residue from whatever she was doing before the output arrived. And each decision, once made, generates its own cognitive trace — its own contribution to the residue she will carry into the next evaluation.

The sequence — reconstruct, evaluate, decide — repeats at each monitoring event. And the residue from each repetition accumulates, layering the cognitive traces of multiple projects, multiple evaluations, and multiple decisions into a sediment that progressively degrades the quality of judgment the builder brings to each subsequent evaluation.

What makes the monitoring tax particularly insidious is that it is hidden by the productivity it enables. The builder who monitors five AI agents is producing five times the output of a builder who works on one project without AI. The productivity metrics — lines of code generated, features shipped, documents produced — record the multiplication. They do not record the cognitive state of the person directing the multiplication. They do not capture whether her evaluation of the fifth agent's output was as sharp as her evaluation of the first. They do not measure whether her architectural judgment at four in the afternoon, after thirty monitoring switches, was as reliable as her judgment at nine in the morning, before the first switch.

The Berkeley study that The Orange Pill examines in detail documented the external symptoms of what the monitoring tax produces internally. Workers reported feeling that AI made their work more intense. They described a sense of "always juggling." They experienced what the researchers called "task seepage" — AI-accelerated work colonizing lunch breaks, elevator rides, the small pauses that had previously served as cognitive recovery windows. These are the observable behavioral manifestations of a cognitive phenomenon the researchers' methodology was not designed to isolate.

Leroy's framework provides the isolation. The juggling sensation is not metaphorical. It is the subjective correlate of carrying multiple active goal sets in working memory simultaneously — each project's suspended goals competing with the current project's active goals for retrieval and processing, exactly as Altmann and Trafton's memory-for-goals framework predicts. The task seepage is not merely a boundary problem. It is a residue problem: each micro-session of AI interaction during a nominally protected pause generates cognitive traces that the pause was supposed to clear. The builder who checks an agent's output during lunch returns to her afternoon carrying not just the calories of her meal but the residue of an evaluation she made while eating it.

There is a further dimension to the monitoring tax that distinguishes it from other forms of context-switching: the quality of the output the builder monitors is, by default, seductively good. Large language models produce text that reads well. AI coding assistants produce code that compiles. AI design tools produce layouts that look professional. The surface quality is consistently high, which means that the cognitive work of evaluation is not detecting obvious failures — a task that is relatively easy and requires relatively little judgment — but detecting subtle inadequacies beneath a competent surface.

This is a particularly demanding form of cognitive work. It requires the evaluator to hold two representations simultaneously: the output as presented, and the output as it should be according to criteria that may be only partially articulable. The gap between the two — the gap between "this looks right" and "this is right" — is where the builder's judgment lives, and it is precisely the gap that attention residue narrows. A builder carrying residue from three previous evaluations is less likely to detect the subtle inadequacy because her cognitive resources for maintaining the "should be" representation are partially occupied by the lingering concerns of previous projects.

The practical consequence is that monitoring accuracy degrades across the day in a pattern that tracks the accumulation of residue. The first evaluation of the morning, performed with relatively clear working memory, is likely to be the most accurate. Each subsequent evaluation is performed with incrementally more residue, and each is incrementally less reliable. By late afternoon, the builder who has been monitoring five agents all day is evaluating outputs with cognitive resources that are substantially depleted — not by the difficulty of the evaluation itself, but by the accumulated cost of every switch that preceded it.

This degradation gradient has organizational implications that most leadership teams have not yet grasped. When an organization assigns a single builder to monitor multiple AI agents across multiple projects — a structure that is becoming standard as organizations seek to leverage the productivity multiplication that AI enables — it is implicitly assuming that the builder's judgment will be consistent across monitoring events. Leroy's research predicts that it will not be. The judgment will degrade systematically across the day, and the degradation will be invisible to both the builder and her managers, because the outputs she approves will look competent (the AI's surface quality ensures this) even when her evaluation of them was compromised.

The degradation is further amplified by a feature of AI-augmented work that has no direct analogue in the pre-AI workplace: the asymmetry of cognitive investment between the builder and the tool. When the builder was also the executor — when she wrote the code herself, drafted the document herself, built the analysis herself — her evaluation of the output was informed by the depth of understanding she had built during the execution process. She knew the code because she had written it. She understood the document because she had constructed its argument. The execution was itself a form of evaluation: each decision made during construction was a judgment about quality, coherence, and fitness for purpose.

When the AI executes and the builder monitors, this informational advantage disappears. The builder must evaluate an artifact she did not construct, whose internal logic she must infer from its external form, and whose quality she must assess without the embodied understanding that construction provides. This is not impossible — experienced practitioners develop the capacity to evaluate work they did not produce — but it is more cognitively expensive than evaluating one's own work, and the additional expense adds to the monitoring tax.

The aggregate cost of the monitoring tax — across a day, across a team, across an organization — is the single largest unmeasured cognitive expense of the AI-augmented workplace. It does not appear on any balance sheet. It is not captured by any productivity dashboard. It is felt, dimly and without a name, in the sensation of exhaustion that AI-augmented builders report at the end of days that their metrics suggest were extraordinarily productive. The metrics are not wrong about the production. They are simply silent about what the production cost.

What would a work environment designed to minimize the monitoring tax look like? It would look different from most AI-augmented workplaces being designed today. It would batch monitoring rather than distribute it across the day. It would assign builders to fewer projects rather than more. It would build recovery windows into the workflow — not as optional breaks but as structural features of the monitoring schedule, calibrated to the empirical rate of residue accumulation. It would measure not just what was produced but the cognitive state of the person who evaluated what was produced.

These design principles are not speculative. They follow directly from Leroy's experimental findings and from the broader cognitive architecture that her findings illuminate. They require no new science. They require only the organizational will to take seriously what the science already says: that the mind that monitors is a mind that degrades, predictably and invisibly, and that the quality of what an organization produces in the age of AI depends less on the capability of its tools than on the cognitive integrity of the humans who direct them.

---

Chapter 4: Multitasking and the Illusion of Productivity

There is a test you can perform on yourself right now. Think about the last time you worked on multiple things in rapid succession — switching between projects, monitoring outputs, evaluating results, making decisions. Now ask: How do you know you did a good job?

The honest answer, for most knowledge workers, is that you felt like you did a good job. You were busy. You moved between tasks with apparent fluency. You produced outputs. Things got done. The day felt full, and fullness feels like accomplishment.

Leroy's research suggests that this feeling is unreliable — not because people are bad judges of their own performance in general, but because attention residue specifically distorts the self-assessment mechanism. The person carrying residue does not experience herself as impaired. She experiences herself as engaged, productive, in motion. The subjective sensation of multitasking is momentum. The objective reality, measured in Leroy's experiments with the precision that self-report cannot provide, is degradation.

The illusion operates through a specific cognitive pathway. When a person switches rapidly between tasks, the switching itself generates a kind of cognitive arousal — a heightened state of activation that feels like productivity. The brain is working hard. Resources are being deployed. Decisions are being made. The sensation of effort is real. What is illusory is the inference that effort equals quality. The brain is working hard, but a significant portion of that work is overhead: the cost of disassembling one cognitive constellation and assembling another, the cost of suppressing residue from the previous task, the cost of reconstructing context for the current one. The effort is real. Much of it is wasted — spent not on the productive work of evaluation and judgment but on the mechanical work of switching.

This distinction — between the effort of switching and the effort of thinking — is invisible from the inside. The brain does not label its operations. It does not tag some cognitive expenditures as "productive" and others as "overhead." It registers all of them as work. The multitasker who has spent her day switching between five projects has worked as hard as the single-tasker who spent her day on one. She may have worked harder, in terms of raw cognitive expenditure. But a substantial fraction of that expenditure was consumed by switching costs, and the fraction that remained for actual evaluation and judgment was correspondingly reduced.

Leroy's experiments measured this reduction directly. Participants who switched between tasks showed not just slower performance but lower-quality performance: worse accuracy on complex judgments, more errors in information integration, less sensitivity to subtle distinctions in the material they were evaluating. The degradation was not uniform across task types. It was most pronounced on exactly the kinds of cognitive operations that AI-augmented work demands most: evaluating complex outputs, making decisions under uncertainty, integrating information from multiple sources into a coherent judgment.

The illusion is self-reinforcing through a mechanism that deserves careful examination. The multitasker has no baseline for comparison. She cannot know what her judgment would have been in the absence of residue, because she is always carrying residue. Her impaired performance is her only performance. She has never — in the context of a typical AI-augmented workday — experienced herself evaluating an AI output with a fully cleared working memory, because the structure of her work ensures that no evaluation occurs in that state. Every evaluation is preceded by a switch, and every switch generates residue, and the residue is the water she swims in.

This absence of a baseline produces a specific cognitive distortion that the broader psychological literature calls the Dunning-Kruger effect applied to metacognition: the less accurate your self-assessment, the less likely you are to recognize its inaccuracy. The multitasker who consistently makes judgments under the influence of residue does not know that her judgments are compromised, because she has no uncompromised judgments to compare them to. She calibrates her sense of normal performance to her residue-impaired performance, and when asked whether she is doing good work, she says yes — because relative to her baseline, she is. The baseline is just lower than she thinks.

Organizations reinforce the illusion through their measurement systems. Productivity metrics in most knowledge work environments measure output: features shipped, documents produced, tickets resolved, projects completed. These metrics capture the quantity of production. They do not capture the quality of the judgment that guided production. A feature that shipped with a subtle architectural flaw that will cause problems six months hence is counted the same as a feature that shipped cleanly. A document that made the wrong argument compellingly is counted the same as a document that made the right argument precisely. A project that was completed on time but directed toward the wrong objective is counted the same as a project that was completed on time and served its intended purpose.

In the pre-AI era, the quality gap between residue-impaired and residue-free judgment was somewhat constrained by the pace of production. The builder who made a flawed judgment during code execution would encounter the consequences relatively quickly — the code would fail, the test would break, the user would complain — and the feedback would force a correction. The execution process itself served as a quality check, not because it was designed for that purpose but because the friction of implementation surfaced problems that flawed judgment had introduced.

AI tools compress this feedback cycle in ways that can either help or harm. On one hand, the speed of AI execution means that the consequences of flawed judgment can be detected sooner — a prototype built in hours can be tested in hours, rather than the weeks required for manual construction. On the other hand, the volume of output that AI enables means that more judgment calls are being made per unit time, each with less cognitive resource behind it, and the aggregate error rate — even if the per-decision error rate remains constant — increases simply because there are more decisions.

If the per-decision error rate also increases, as Leroy's framework predicts it does under conditions of accumulated residue, then the compounding is multiplicative: more decisions, each made with less cognitive resource, producing a total error load that grows faster than either factor alone would suggest.

There is an additional mechanism through which the illusion of productivity distorts organizational decision-making: the halo of AI-generated quality. Because AI tools produce outputs with high surface quality — clean code, well-structured prose, professional design — the builder's evaluation task is not detecting obvious failures but detecting subtle inadequacies beneath a competent surface. When her judgment is impaired by residue, the most likely failure mode is not that she will approve something obviously bad. It is that she will approve something subtly wrong: code that works but does not scale, prose that reads well but argues incorrectly, design that looks professional but communicates the wrong message.

These subtle failures are the most dangerous kind because they propagate. A subtle architectural flaw in code that passes residue-impaired review becomes the foundation for subsequent development. A strategic recommendation that is subtly misdirected shapes decisions downstream. The error compounds through the system in ways that are difficult to trace back to the moment of impaired judgment that introduced it, because the moment looked — to everyone involved, including the builder who made the judgment — like a perfectly competent evaluation.

The organizational cost of these propagating subtle errors is, almost by definition, unmeasured. No dashboard tracks the number of architectural flaws introduced by evaluations performed under cognitive load. No metric captures the strategic missteps attributable to monitoring judgments made at 4 p.m. after thirty context switches. The costs appear eventually — as technical debt, as strategic drift, as products that don't quite work the way they should — but they appear diffusely, attributed to complexity or market dynamics or bad luck rather than to the systematic degradation of human judgment under conditions that the organization itself created.

Cal Newport, whose work on deep focus brought Leroy's attention residue concept to a wide audience, identified the cultural dimension of the problem with characteristic clarity. Newport observed that knowledge work culture has no robust theory of productivity — no equivalent of the manufacturing engineer's time-and-motion study that could distinguish between productive effort and overhead. In the absence of such a theory, the culture defaults to what Newport called "busyness as a proxy for productivity": the assumption that visible activity indicates valuable output. The multitasker who is always in motion — switching, monitoring, evaluating, deciding — looks productive because she is always busy. The single-tasker who stares at one problem for three hours looks unproductive because, from the outside, she appears to be doing nothing.

AI tools intensify this cultural distortion by increasing the rate at which visible activity can occur. The builder monitoring five agents is visibly busier than the builder focused on one project. She produces more observable outputs per hour. She switches more frequently between screens. She generates a constant stream of evaluations, decisions, and corrections that register as activity on any reasonable metric.

What the metrics do not capture is that her final evaluation of the day — the one that might determine whether a product feature serves its users or merely ships on time — was made with a cognitive system carrying the accumulated residue of every switch that preceded it. The evaluation took the same amount of time as her morning evaluation. It produced the same feeling of competence. But Leroy's research predicts, with the confidence that replication across multiple experimental paradigms affords, that it was performed with meaningfully less cognitive resource. The judgment was degraded. The degradation was invisible. And the output it approved will propagate through the system, carrying the imprint of that invisible degradation into everything it touches.

The illusion of productivity is not a minor distortion at the margins of organizational performance. It is a systematic misalignment between what organizations measure and what determines the quality of their output. In an era when AI handles execution and human judgment handles direction, the misalignment becomes potentially catastrophic: the thing that matters most — the quality of judgment — is the thing that is measured least and degraded most by the very tools that were supposed to enhance it.

Breaking the illusion requires two interventions that operate at different levels. At the organizational level, it requires new metrics: measures of judgment quality, decision accuracy, and evaluation reliability that can detect the degradation before its consequences propagate. At the individual level, it requires something harder — the willingness to question one's own subjective experience of competence. To ask, honestly, whether the feeling of productive momentum is an accurate signal of productive quality or whether it is the specific, documented, experimentally validated illusion that Leroy's research describes.

The question is uncomfortable. It should be. The most dangerous illusions are the ones that feel exactly like reality.

Chapter 5: When Engagement Produces the Deepest Residue

The cruelest finding in Sophie Leroy's research program is not that task-switching degrades performance. That finding is uncomfortable but unsurprising — a confirmation, with experimental rigor, of something most knowledge workers have suspected about their own experience. The cruelest finding is the one that inverts the intuitive relationship between quality of engagement and quality of outcome: the deeper the cognitive engagement with a task, the more persistent the residue when that engagement is interrupted.

This is not a minor qualification appended to the main result. It is a structural feature of the phenomenon that reverses the expected relationship between effort and reward. In most domains, deeper engagement produces better outcomes. The surgeon who is more absorbed in the procedure performs more precisely. The writer who is more engaged with the argument produces more coherent prose. The engineer who is more invested in the problem finds more elegant solutions. Depth of engagement is, in the general case, the single best predictor of quality.

Leroy's research does not dispute this. What it reveals is the cost that depth exacts at the moment of transition. The same cognitive and emotional resources that make deep engagement productive — the populated working memory, the configured executive control, the invested emotional system — are the resources that resist disengagement when a switch is imposed. The deeper the engagement, the more resources are captured. The more resources are captured, the more persistent the residue when those resources must be redirected. The builder who was most deeply absorbed in Project A carries the most residue into Project B.

The mechanism is not difficult to understand once the cognitive architecture is clear. Deep engagement means that working memory is densely populated with the task's representations — its variables, constraints, intermediate results, evaluative criteria, unresolved questions. These representations are not stored passively. They are maintained through active rehearsal by the central executive, which allocates attentional resources to keep them available for processing. The denser the population, the more attentional resources are allocated, and the more resistant those representations are to displacement when a new task demands the same resources.

Executive control is similarly invested. Deep engagement configures the control system for the specific demands of the task: which features of the environment to attend to, which associations to activate, which response tendencies to prime, which competing representations to suppress. This configuration is not a single setting that can be flipped like a switch. It is a pattern of biases and facilitations distributed across the control system, each bias the result of sustained practice with the current task's demands. The more deeply configured the control system becomes, the more effortful the reconfiguration required by a new task, and the more incomplete that reconfiguration is likely to be in the first minutes after the switch.

Emotional investment compounds both effects. A task that engages the builder's sense of purpose — that she cares about, that connects to her professional identity, that she finds genuinely interesting — activates emotional circuits that sustain attention and resist distraction. These circuits do not deactivate on command. Emotional disengagement from a meaningful task is a process that unfolds over minutes, not seconds, and during the disengagement period, the emotional trace of the abandoned task colors the cognitive processing of the new one. The builder who was passionately engaged with a design problem thirty seconds ago cannot bring emotionally neutral attention to the monitoring task that pulled her away. She brings attention that is still warm with the previous engagement, and the warmth interferes.

This compound persistence — cognitive, executive, and emotional — is what makes deep engagement the most potent generator of residue in Leroy's framework. And it is what makes the finding so cruel in the context of AI-augmented work.

Consider the specific phenomenology of building with AI tools. Edo Segal's account in The Orange Pill describes the experience with a candor that Leroy's experimental participants, constrained by the format of controlled studies, could not provide. The tight feedback loops — describe what you want, see it realized in seconds, refine and iterate — produce precisely the kind of deep cognitive and emotional engagement that Leroy's research identifies as maximally residue-generating. The builder is not casually attending to a routine task. She is absorbed. Working memory is densely populated with the project's context. Executive control is finely configured for the specific demands of the creative interaction. Emotional investment is high because the work feels meaningful — the gap between imagination and artifact has collapsed, and the builder is operating in territory that was previously inaccessible to her.

Csikszentmihalyi would recognize this state immediately. It meets every criterion of flow: clear goals, immediate feedback, challenge-skill balance, a sense of control. The builder knows what she is trying to create. The AI's rapid response lets her see instantly whether her direction was right. The challenge of directing the collaboration demands her full capability without overwhelming it. She feels that her decisions matter, that she is steering the process rather than being carried by it. This is the optimal human experience. It is also the state that produces the maximum cognitive tax when interrupted.

The irony is structural, not incidental. The very features of AI-augmented work that make it flow-conducive are the features that make interruption maximally costly. The tight feedback loops that maintain engagement also deepen the cognitive constellation that must be disassembled at each switch. The immediate responsiveness that keeps the builder in the zone also ensures that the zone is deeply populated when she is pulled out of it. The sense of creative control that makes the work meaningful also generates the emotional investment that resists disengagement.

The AI-augmented workplace, in other words, has engineered a near-perfect flow-generating environment — and then embedded it in an organizational structure that demands constant interruption of that flow.

The demand for interruption comes from the same source as the flow itself: the tool's capability. Because AI agents can operate on multiple projects simultaneously, and because each agent produces output that requires human evaluation, the builder who directs multiple agents must interrupt her own flow state to monitor their production. The more capable the tool, the more output it generates. The more output it generates, the more frequently the builder must switch contexts. The more frequently she switches contexts, the more residue she accumulates. And because each switch pulls her out of a state of deep engagement — the state the tool itself enabled — the residue at each switch is at or near the maximum her cognitive system can produce.

Leroy's experimental data on the engagement-residue relationship was generated in conventional workplace settings: participants switching between office tasks of varying engagement levels. The tasks were engaging in the way that interesting professional work is engaging — absorbing, requiring concentration, producing satisfaction. They were not engineered for maximum engagement in the way that AI-augmented creative work is engineered for maximum engagement. The residue effects Leroy measured were significant in those conventional settings. In the AI-augmented setting, where the engagement is deeper, more sustained, and more emotionally invested, the effects are predicted to be correspondingly more severe.

This prediction has not yet been tested with Leroy's specific experimental paradigm in an AI-augmented context — a gap in the literature that is significant and urgent. But the prediction follows directly from the established relationship between engagement depth and residue magnitude, and there is no theoretical reason to expect that the relationship would reverse or attenuate in the AI context. If anything, the unprecedented depth of engagement that AI tools enable — the "I have NEVER worked this hard, nor had this much fun with work" that Nat Eliason described — suggests that the residue generated by interruptions of AI-augmented flow states may exceed anything Leroy's original experiments observed.

There is a further dimension to this finding that connects to the organizational dynamics of the AI-augmented workplace. In most organizations, the people who are most deeply engaged with their work — the senior engineers, the lead designers, the principal architects — are also the people who are most valuable as monitors of AI-generated output. Their depth of expertise makes their evaluative judgment the most reliable. Their taste, their architectural intuition, their accumulated understanding of what works and what breaks — these are the qualities that make them indispensable as the human-in-the-loop for AI-directed projects.

These are also the people whose engagement with their own work is deepest. The senior engineer working on the core architecture is not casually attending to a routine task. She is operating at the full extent of her accumulated expertise, in a state of engagement that draws on decades of experience and produces the most sophisticated judgments her cognitive system is capable of generating.

When the organization asks her to interrupt that engagement to monitor the output of an AI agent working on a different project — which it does, routinely, because her expertise makes her monitoring indispensable — the residue she carries into the monitoring task is maximal. The very quality that makes her the best evaluator — her depth of engagement — is the quality that guarantees her evaluation will be performed under the heaviest cognitive load. The organization is systematically exposing its most valuable judgments to the most severe form of cognitive degradation.

This is not a problem that can be solved by hiring more monitors or distributing the monitoring load more evenly. Junior staff who carry less residue because their engagement is shallower also bring less expertise to the evaluation. The monitoring tax is not merely a load problem. It is a quality problem: the people who can best evaluate AI output are the people whose cognitive state is most severely compromised by the act of switching to perform that evaluation.

The finding extends beyond the individual to the temporal structure of the workday. Deep engagement builds over time. The builder does not achieve her deepest flow state in the first five minutes of a work session. She achieves it after a period of immersion — twenty minutes, thirty, sometimes longer — during which working memory populates, executive control configures, emotional investment deepens, and the full cognitive constellation required for her best work assembles itself. This assembly time is not wasted. It is the investment that makes the subsequent period of deep work possible.

When that investment is interrupted by a monitoring demand, the cost is not just the residue carried into the monitoring task. It is the loss of the assembled constellation itself. After the monitoring is complete and the builder returns to her original project, she must reassemble the constellation from scratch. The twenty or thirty minutes of immersion that preceded the interruption must be reinvested. And the reassembled constellation is unlikely to be identical to the one that was interrupted — some of the associations will have decayed, some of the context will need to be re-derived, and the emotional momentum that sustained the original engagement must be regenerated.

The total cost of a single interruption, then, is the sum of three components: the residue carried into the monitoring task (degrading the quality of the evaluation), the disassembly of the original constellation (destroying the conditions for deep work), and the reassembly cost after the interruption (consuming time and cognitive resources that could have been spent on productive work). For a builder who is deeply engaged, these costs are substantial. For a builder who is interrupted multiple times per day, they are cumulative and potentially overwhelming.

Leroy's research quantified the first component — the residue effect — with experimental precision. The second and third components are documented in the broader interruption literature, including Gloria Mark's finding that it takes an average of twenty-three minutes and fifteen seconds to return to a task after an interruption. Mark's research, conducted in naturalistic office settings, found that the recovery time was remarkably consistent across contexts and individuals, suggesting that it reflects a fundamental property of cognitive reassembly rather than an individual difference in concentration ability.

Twenty-three minutes. That is the cost, in time alone, of a single interruption of a deeply engaged builder. Multiply by the number of monitoring events in a typical AI-augmented workday — five, ten, twenty switches between projects — and the aggregate recovery time alone consumes a substantial fraction of the working day. Add the residue effects that degrade each evaluation, and the picture that emerges is one of systematic cognitive waste: a workplace that invests heavily in tools that enable deep, creative, flow-conducive work, and then systematically prevents the humans who direct those tools from achieving or sustaining the cognitive states the work requires.

The most productive use of AI tools, Leroy's framework suggests, is not the multiplication of projects across which a builder spreads her attention. It is the deepening of engagement with individual projects — using the tool's speed and capability to explore more possibilities, test more variations, and refine more iterations within a single domain of focused attention. The builder who uses AI to go deeper into one problem, maintaining the engagement that produces her best judgment, generates less output by volume but higher quality per unit of output. She produces one thing well rather than five things adequately.

The organizational pressure runs in exactly the opposite direction. The tool can handle five projects. The economics demand five projects. The metrics reward five projects. And the cognitive science predicts, with the specificity that replication affords, that five projects will produce five evaluations of diminished quality, each degraded by the residue of the engagement that made the builder's judgment worth having in the first place.

The best work generates the worst residue. The deepest engagement produces the highest tax. The organizational response to this finding will determine whether AI-augmented work realizes its promise of enhanced human judgment or delivers, instead, a systematically impaired version of the judgment it was supposed to augment.

---

Chapter 6: The Cost of Context-Switching at Scale

What happens inside one mind is a cognitive phenomenon. What happens inside a thousand minds simultaneously is an organizational one. And the organizational consequences of attention residue, scaled across the AI-augmented workforce, represent a form of systematic quality degradation that no individual can detect and no current metric can capture.

Leroy's experiments measured residue in individuals. The participants were single people, switching between tasks in controlled conditions, their performance measured with the precision that laboratory settings afford. The findings were clear, replicable, and bounded: this much residue, from this kind of switch, producing this much degradation in this kind of judgment. The individual-level story is, by now, well established.

But organizations do not consist of individuals performing in isolation. They consist of interconnected networks of judgment, where one person's evaluation becomes another person's input, where one team's output becomes another team's constraint, where decisions ripple through layers of dependency with compounding consequences. In this networked structure, the individual-level degradation that Leroy documents does not simply add up across people. It multiplies through the connections between them.

Consider the simplest case: a two-person dependency chain. Builder A evaluates an AI agent's output and approves it. Builder B receives A's approved output as input for her own project. If A's evaluation was performed under the influence of attention residue — if her judgment was subtly degraded by the accumulated switches of her day — then the output she approved may contain a subtle inadequacy that a fully resourced evaluation would have caught. Builder B receives this subtly inadequate input and builds on it. If B is also carrying residue, her capacity to detect the inadequacy in A's approved output is also diminished. The subtle flaw propagates. It becomes embedded in B's work, which becomes input for C, whose residue-impaired evaluation fails to catch it, and the flaw migrates deeper into the system with each handoff.

In a pre-AI organization, the pace of this propagation was constrained by the pace of production. Handoffs between builders occurred on the timescale of days or weeks, and each handoff typically involved a period of focused review during which the receiving builder could examine the input with relatively cleared working memory. The production bottleneck — the time required to actually build something — created natural firebreaks against the propagation of subtle errors.

AI tools remove the production bottleneck. Handoffs occur on the timescale of hours or minutes. The volume of output flowing through the dependency chain is dramatically higher. And each builder in the chain is performing more evaluations per day, each under a heavier residue load, than her pre-AI predecessor. The firebreaks are gone. The subtle errors propagate faster, through more connections, evaluated by more impaired judgment, producing an organizational quality debt that accumulates with a speed that has no precedent in the history of knowledge work.

The concept of quality debt, borrowed from the software engineering notion of technical debt, deserves careful definition. Technical debt is the accumulated cost of expedient decisions — code that works but is poorly structured, architecture that solves today's problem but creates tomorrow's. Quality debt, as attention residue generates it, is subtly different. It is the accumulated cost of decisions made with adequate but degraded judgment — evaluations that approved outputs that were good enough to pass but not good enough to be right. Technical debt is the cost of knowing what is correct and choosing the shortcut. Quality debt is the cost of not knowing that the shortcut was taken, because the cognitive impairment that produced the error was invisible to the person who made it.

This invisibility is what makes quality debt structurally more dangerous than technical debt. Technical debt is, at least in principle, knowable. The engineer who wrote the expedient code knows she wrote it. She may even have left a comment: "TODO: refactor this." The organization can audit for technical debt, quantify it, prioritize its repayment. Quality debt has no such markers. The builder who approved a subtly flawed output under the influence of residue did not know she was impaired. She left no comment. There is no TODO. There is only an output that looks right and is subtly wrong, embedded in a system where dozens of subsequent decisions were built on the assumption that it was correct.

The organizational manifestation of accumulated quality debt is what experienced practitioners describe as drift — the gradual divergence between what a system was supposed to do and what it actually does, between the intended behavior and the emergent behavior, between the strategic direction and the operational reality. Drift is familiar to every organization. It is typically attributed to complexity, to changing requirements, to the inevitable entropy of large systems. Leroy's framework suggests an additional contributing factor that is rarely diagnosed: the systematic degradation of evaluative judgment under conditions of accumulated residue, producing a steady trickle of subtle errors that individually are insignificant but collectively bend the system's trajectory away from its intended course.

The scale at which AI-augmented organizations now operate makes the drift potentially more severe and more rapid than anything the pre-AI workplace produced. When every builder is monitoring multiple AI agents, every builder is carrying residue, and the cumulative effect is an organization that is producing more output with less attention to any individual piece of output. The organization is busy — visibly, measurably, impressively busy. The question is whether it is effective, and the answer depends on the quality of the thousands of evaluative judgments made each day by people whose cognitive resources have been systematically depleted by the very workflow that produced the need for those judgments.

There is a spatial dimension to the problem as well. In most organizations, the builders who are most heavily monitored — who direct the most AI agents and make the most evaluative judgments per day — are concentrated at specific nodes in the organizational network. They are the senior engineers, the lead architects, the principal designers whose expertise makes their judgment most valuable. These are the nodes through which the most consequential outputs flow. And these are the nodes where the monitoring tax is highest, because the volume of evaluation demands is proportional to the person's centrality in the organizational network.

The result is a pattern that organizational network analysis would recognize as a vulnerability: the highest-consequence nodes in the network are the nodes most likely to be operating under residue-impaired conditions. The most important judgments in the organization are being made by the most cognitively taxed people. This is not because the organization is poorly managed. It is because the organizational logic that assigns the most important evaluations to the most expert people is the same logic that concentrates the monitoring tax on those same people.

The organizational countermeasure is not, as some have proposed, to distribute evaluations more evenly across the workforce. Even distribution reduces the monitoring tax per person but does not eliminate it, and it introduces a different problem: less expert evaluators making consequential judgments about outputs they are less qualified to assess. The tradeoff is between concentrated expertise with heavy residue and distributed evaluation with lighter residue but thinner judgment. Neither option fully addresses the problem, because the problem is not distributional. It is structural. The structure of AI-augmented work generates more evaluative demands than the human cognitive system can meet at the quality level the demands require.

The structural solution, which later chapters will examine in detail, involves reducing the number of evaluative demands rather than distributing them differently. This means assigning builders to fewer projects rather than more. It means using AI's capability to deepen engagement with individual projects rather than to multiply the number of projects under a single builder's supervision. It means accepting that the twenty-fold productivity multiplier does not mean twenty projects where there was one. It means one project pursued with twenty times the depth, twenty times the iteration, twenty times the refinement — and evaluated by a builder whose cognitive resources have not been fragmented across nineteen other contexts.

This is a difficult organizational choice because it sacrifices visible breadth for invisible depth. The organization that assigns each builder to one deep project looks, on every standard metric, less productive than the organization that assigns each builder to five shallow ones. It ships fewer features. It closes fewer tickets. It produces less visible output per unit of labor cost. What it produces — though this is much harder to measure — is higher-quality judgment at each evaluative node, lower quality debt per output, less drift per quarter, and a system whose actual behavior more closely matches its intended behavior over time.

The difficulty of measuring the quality advantage is what makes the organizational choice so hard to make. Quality debt, unlike financial debt, does not appear on a balance sheet. Drift, unlike revenue decline, does not trigger an alert. The consequences of residue-impaired judgment accumulate slowly, manifesting as vague symptoms — "the product doesn't feel right," "we keep fixing the same problems," "our technical debt is growing faster than we can address it" — that are attributed to complexity or execution failure rather than to the cognitive state of the evaluators whose judgments shaped the system's trajectory.

Leroy's framework does not resolve this measurement problem. It does something arguably more valuable: it identifies what needs to be measured, even if the instruments for measuring it at organizational scale do not yet exist. The variables are specific: residue load per evaluator, judgment quality per evaluation as a function of residue load, quality debt per output as a function of judgment quality, and drift per period as a function of accumulated quality debt. Each variable is, in principle, measurable. None is currently measured in any organization the research literature describes.

The gap between what is measurable in principle and what is measured in practice is where most organizational failures live. The organizations that close this gap — that develop the instrumentation to track the cognitive state of their evaluators and the quality consequences of evaluation under load — will be the organizations that discover whether the AI-augmented productivity revolution is producing genuine value or merely producing more of something slightly worse. The data from those organizations will determine whether the twenty-fold multiplier was a multiplication of capability or a multiplication of adequacy. Whether the AI-augmented future delivered better judgment at scale or merely more judgment, made by more impaired minds, producing more outputs that looked right and were subtly, invisibly, consequentially wrong.

The answer is not yet known. What is known, from Leroy's experiments and their direct implications, is that the default trajectory — more projects per builder, more switches per day, more residue per switch — points toward adequacy rather than excellence. Changing the trajectory requires changing the structure. And changing the structure requires seeing the cost that the current structure hides.

---

Chapter 7: Cognitive Resources Are Not Renewable on Demand

There is a fantasy embedded in the architecture of the AI-augmented workplace, and it goes like this: the builder depletes her cognitive resources during the workday, and then she rests, and then the resources regenerate, and then she returns the next morning at full capacity, ready to deploy them again. The fantasy treats attention like a battery. Discharge during the day. Recharge overnight. Start fresh tomorrow.

Leroy's research, and the broader cognitive science on which it draws, suggests that this model is dangerously incomplete. Attention is not a battery. It is closer to a muscle — one that can be strengthened through deliberate practice and weakened through misuse, one that fatigues under load and recovers through rest, but one whose recovery follows specific biological constraints that the work environment cannot override by wishing them away.

The muscle analogy is imperfect, but it captures the essential asymmetry that the battery model misses: the rate of depletion and the rate of recovery are not symmetric. A muscle that is worked to exhaustion does not recover in the time it took to exhaust it. The recovery takes longer, requires specific conditions — rest, nutrition, sleep — and is degraded by subsequent exertion before recovery is complete. An athlete who trains to failure every day without adequate recovery does not get stronger. She breaks down. The clinical term is overtraining syndrome, and its cognitive analogue is what the AI-augmented workplace is beginning to produce at scale.

The cognitive resources that attention residue depletes — working memory capacity, executive control bandwidth, emotional regulation — regenerate through specific biological processes that operate on their own timescale. Working memory clears through the decay of activated representations, a process that requires not just time but the absence of new activations during the decay period. Executive control recovers through the relaxation of the biases and facilitations that configure it for specific tasks, a process that requires not just time but disengagement from task-oriented processing. Emotional regulation replenishes through the deactivation of the stress-response systems that sustained engagement mobilizes, a process that requires not just time but the subjective experience of safety, of having nothing that demands immediate action.

Each of these recovery processes has a characteristic timescale. Working memory clearance operates on the order of minutes to tens of minutes, provided no new task demands intervene. Executive control relaxation operates on the order of tens of minutes to hours. Emotional regulation recovery varies widely but typically requires longer still, particularly after sustained periods of high-engagement work. Sleep performs a specific and irreplaceable consolidation function, clearing metabolic waste products from the brain, consolidating learning, and resetting the neural systems that support sustained attention during waking hours.

The critical feature of these recovery processes is their conditionality. They do not occur automatically with the passage of time. They occur when specific conditions are met: the absence of task demands, the presence of genuine cognitive rest, adequate sleep of sufficient quality and duration. Time spent in the presence of task demands — even low-level demands, even the mere availability of a device that might generate demands — is not recovery time. It is time during which the recovery processes are partially or wholly suppressed by the cognitive system's maintenance of readiness.

This conditionality has direct implications for the AI-augmented workplace, where the availability of task demands is essentially infinite. The builder who carries a device capable of connecting her to her AI agents at any moment is a builder whose cognitive system is maintaining readiness at all times. The readiness is not costless. It suppresses the recovery processes that would otherwise clear residue, restore executive control, and replenish the emotional regulation resources that sustained engagement depletes. The device in her pocket is not just a potential source of interruption. It is an active suppressor of recovery.

Ye and Ranganathan's observation of "task seepage" — AI-accelerated work colonizing lunch breaks, elevator rides, and waiting rooms — is the behavioral manifestation of this recovery suppression. Each micro-session of AI interaction during a nominally protected pause generates new cognitive demands that prevent the recovery processes from operating. But even when the builder does not interact with the device — when she merely carries it, aware that an agent might produce output that requires her evaluation — the readiness maintenance exacts a recovery cost. The cognitive system cannot fully disengage from task-oriented processing while the possibility of a task demand remains salient.

The result is a pattern of chronic partial recovery that compounds across days and weeks. The builder ends Monday carrying some residue that Monday's rest did not fully clear. Tuesday's switching adds new residue to the uncleareed remainder. By Friday, the accumulated deficit — the gap between the recovery the builder's cognitive system needed and the recovery it actually achieved — is substantial. The weekend provides partial restoration, but if the builder checks her agents over the weekend — and the data suggests that most do — the restoration is incomplete, and the following Monday begins from a lower baseline than the previous Monday.

The pattern is familiar to anyone who has studied chronic sleep debt, where the consequences of insufficient sleep accumulate over weeks and months in ways that are not apparent on any single day but are measurable in degraded performance, impaired judgment, and increased error rates over time. Chronic cognitive debt operates through an analogous mechanism: each day's residue is not fully cleared, the uncleareed residue accumulates, and the accumulated debt manifests as a gradual, persistent, and often unrecognized decline in the quality of the cognitive operations the builder performs.

The decline is unrecognized for the same reason that individual attention residue is unrecognized: the builder has no residue-free baseline against which to compare her current performance. She recalibrates her sense of normal to her impaired state, and the recalibration is seamless — she does not feel slower, less sharp, less reliable than she was six months ago. She feels like herself. But the self she feels like is the self carrying six months of accumulated cognitive debt, and the judgment that self produces is not the judgment her fully rested cognitive system would produce.

Leroy's published work does not directly address chronic accumulation — her experimental paradigm measures acute residue effects within a single session. But the extrapolation from acute to chronic follows from the recovery dynamics the broader literature documents. If residue from a single switch requires minutes to clear, and the builder performs dozens of switches per day with insufficient recovery time between them, then the daily residue load exceeds the daily recovery capacity. The excess carries over. The carryover accumulates. And the accumulated carryover degrades performance in ways that the acute studies can predict but only longitudinal research can confirm.

The longitudinal research does not yet exist in the specific context of AI-augmented work. This is a gap in the literature that is not merely academic but urgent, because the population of builders working under conditions of chronic cognitive debt is growing rapidly, and the consequences of that debt — on product quality, on strategic decision-making, on the personal well-being of the builders themselves — are accumulating in real time whether or not anyone is measuring them.

What does exist is the self-report data from the builders themselves, and it is remarkably consistent. The pattern described in The Orange Pill — the "productive addiction" that one Substack post called "Help! My Husband Is Addicted to Claude Code," the inability to stop, the erosion of boundaries between work and rest — is the behavioral signature of a cognitive system that is depleting faster than it recovers. The builders report exhaustion that is not proportional to the difficulty of their work. They report a specific quality of fatigue that is different from the fatigue of hard physical labor or the fatigue of boring work. It is the fatigue of a system that has been running at a high cognitive load for an extended period without adequate recovery — a grey, diffuse exhaustion that does not respond well to conventional rest because the conditions for cognitive recovery are not being met.

The conventional responses to this fatigue are, from a cognitive science perspective, inadequate. "Take breaks" is the standard advice, and it is not wrong, but it is insufficient if the breaks do not meet the conditions for cognitive recovery. A break during which the builder checks her phone is not a recovery break. A break during which she thinks about the problem she was working on is not a recovery break. A break during which she is aware that her agents are producing output that will require her evaluation when she returns is not a recovery break. Genuine cognitive recovery requires genuine cognitive disengagement — a period during which no task demands are active, no readiness is maintained, and the recovery processes can operate without suppression.

The organizational implication is that rest is not a scheduling problem. It is a design problem. The recovery that the cognitive system requires cannot be produced by inserting time blocks into a calendar and labeling them "break." It can only be produced by designing work environments in which the conditions for genuine disengagement are structurally available — where the builder can, during designated recovery periods, actually cease maintaining readiness for task demands, because the organizational structure ensures that no demands will arrive during those periods.

This is harder to build than it sounds. It requires not just individual discipline — the builder's willingness to put down her phone — but organizational commitment: the guarantee that no agent's output will require her evaluation during recovery periods, that no colleague will escalate a decision to her during protected time, that the system itself will not generate demands that breach the recovery boundary. The boundary must be structural, not voluntary, because voluntary boundaries fail under the pressure of internalized achievement norms — the same norms that Byung-Chul Han describes as the engine of self-exploitation in the achievement society.

The builder who feels she should be checking her agents during her break is responding to a real organizational pressure. The check that reveals a problem she can fix quickly feels productive. The fix that prevents a downstream delay feels responsible. The responsiveness that her colleagues and her managers observe feels like good performance. Every incentive in the system points toward breaching the recovery boundary, and the cost of breaching it — a few more minutes of residue-generating interaction, a few more minutes of suppressed recovery — is invisible to everyone, including the builder herself.

The costs become visible only in aggregate, over time, as chronic cognitive debt manifests in the symptoms that organizations struggle to diagnose: declining judgment quality, increasing error rates, rising burnout, the specific grey exhaustion that high-performing builders report with increasing frequency. These symptoms are attributed to the demands of the work — "this is a demanding job, and demanding jobs produce burnout." What the cognitive science suggests is that the demands themselves are not the primary cause. The primary cause is the mismatch between the demands and the recovery — the systematic depletion of cognitive resources at a rate that the available recovery cannot match.

AI tools did not create this mismatch. Email created it. Smartphones deepened it. The always-on culture institutionalized it. What AI tools did was accelerate it, by increasing the rate at which productive cognitive work can occur and thereby increasing the rate at which cognitive resources are consumed. The acceleration is genuine — the twenty-fold productivity multiplier is real, the outputs are real, the capability expansion is real. But the acceleration of production without a corresponding redesign of recovery is an acceleration toward depletion. The resource that is being multiplied — the builder's cognitive output — depends on a resource that is not being multiplied: the builder's cognitive capacity to recover.

Sustainability in the AI-augmented workplace is not, fundamentally, a wellness initiative. It is an engineering constraint, as real and as non-negotiable as the constraints on any physical system. A machine that runs beyond its thermal limits overheats and fails. A cognitive system that runs beyond its recovery limits degrades and fails. The failure mode is different — slower, subtler, more easily attributed to causes other than the actual one — but the underlying principle is the same. The system's output depends on the system's capacity, and capacity depends on recovery, and recovery depends on conditions that the work environment must provide.

The organizations that treat recovery as an engineering constraint — that design it into the workflow with the same rigor they apply to server uptime and network redundancy — will be the organizations whose builders can sustain the quality of judgment that the AI-augmented future demands. The organizations that treat recovery as a personal responsibility — that tell their builders to "take care of themselves" while designing work environments that prevent them from doing so — will discover, over quarters and years, that the productivity multiplication they celebrated was purchased on cognitive credit, and that the debt has come due.

---

Chapter 8: Designing Work for Depth, Not Breadth

Everything in the preceding chapters has been diagnosis. The mechanism of attention residue. The monitoring tax it produces. The illusion it creates. The cruelty of its relationship to engagement. The compounding of its costs at organizational scale. The depletion it drives when recovery is insufficient. The diagnosis is complete, and it is, by design, uncomfortable. What follows is the prescription.

The prescription is not "use less AI." The tools are too capable, too valuable, and too deeply integrated into the practice of knowledge work for that recommendation to be either realistic or desirable. Leroy's research does not argue against AI tools. It argues for a specific relationship with them — one that respects the cognitive constraints her experiments document and designs the work environment to minimize their cost.

The central principle is deceptively simple: use AI to deepen engagement with fewer projects rather than to spread attention across more. The twenty-fold productivity multiplier is not a license to multiply the number of projects a builder monitors. It is an opportunity to invest dramatically more cognitive depth in the projects she undertakes.

The distinction between depth and breadth is not merely a preference. It is a prediction derived from Leroy's experimental findings. Depth — sustained engagement with a single project, using AI to explore more possibilities, test more variations, refine more iterations within a single domain of focused attention — produces evaluation under conditions of minimal residue. The builder's working memory is populated with one project's context. Her executive control is configured for one project's demands. Her emotional investment is concentrated in one project's outcome. She evaluates AI output with the full cognitive resources her system can deploy.

Breadth — distributing attention across multiple projects, using AI to manage several streams simultaneously — produces evaluation under conditions of maximal residue. Each switch between projects generates the cognitive tax the preceding chapters describe. Each evaluation is performed with resources partially occupied by the concerns of other projects. Each judgment is degraded by the accumulated switches of the day. The builder is productive by every standard metric. She is impaired by every cognitive one.

The organizational design that supports depth has several specific features, each grounded in the experimental evidence and each representing a departure from the emerging norms of AI-augmented work.

The first feature is sequenced rather than parallel workflows. Instead of assigning a builder to monitor five AI agents simultaneously, assign her to one project at a time, with clear completion milestones that define when she transitions to the next. The sequencing preserves the conditions for deep engagement — sustained attention, minimal switching, low residue load — while still leveraging AI's capability to accelerate production within each project. The builder moves through projects serially rather than managing them in parallel. Each project receives her full cognitive resources for its duration. The total number of projects completed may be the same; the quality of judgment applied to each will be dramatically higher.

The sequencing requires organizational discipline because it conflicts with the apparent efficiency of parallelism. A builder working on one project while four others wait appears less productive than a builder monitoring all five simultaneously. The appearance is misleading — the sequential builder's single project receives higher-quality judgment, produces less quality debt, and generates fewer downstream errors — but the misleading appearance is what the organizational culture sees and rewards. Implementing sequenced workflows requires leadership that understands the difference between visible productivity and actual quality, and that is willing to accept the former's reduction in exchange for the latter's improvement.

The second feature is completion rituals. Leroy's finding that completed tasks produce less residue than incomplete ones suggests a specific intervention: designing work interactions with AI tools around natural completion points rather than around time blocks. Instead of working with an AI agent for a fixed period and then switching — which almost guarantees that the switch will occur mid-task, generating maximum residue — the builder works until a defined sub-task is complete, marks the completion explicitly, and then transitions to the next context.

The ritual need not be elaborate. It might be as simple as writing a brief note summarizing where the project stands and what the next step will be — what Leroy's research suggests is a "ready to resume" plan that reduces the residue associated with leaving a task. The act of articulating the current state and the next step performs two functions: it provides closure on the current work session, reducing the cognitive persistence of the task's unresolved elements, and it creates an external memory aid that reduces the reconstruction cost when the builder returns to the project.

The third feature is transition protocols between AI-assisted and unassisted work. Leroy's framework, interpreted for the AI context, suggests that switching between AI-collaborative work and solo cognitive work generates a specific form of residue: the patterns, associations, and evaluative stances activated during AI interaction persist into subsequent non-AI cognitive work. The builder who has spent an hour evaluating AI outputs — judging, correcting, directing — carries the evaluative stance into her next activity, even if that activity requires a different cognitive posture: creative ideation, strategic thinking, interpersonal communication.

A transition protocol is a brief activity — five minutes, perhaps ten — designed to clear the evaluative stance before the next cognitive mode is required. The activity should be genuinely disengaging: a short walk, a few minutes of unstructured conversation, a physical task that occupies the motor system without demanding the cognitive resources that need to clear. The protocol is not a break in the conventional sense. It is a cognitive reset — a deliberate intervention at the transition point between modes of work, designed to minimize the carryover that Leroy's research predicts.

The fourth feature is protected focus periods during which no monitoring is required. These are not optional breaks that the builder may or may not take depending on workload. They are structural features of the workflow — periods during which the organizational system guarantees that no AI agent will produce output requiring the builder's evaluation, no colleague will escalate a decision to her attention, and no notification will breach the focus boundary.

The guarantee must be structural because voluntary focus protection fails under the pressure of organizational norms and individual achievement motivation. Leroy's own practical recommendations, drawn from her research, emphasize this point: the research-supported approach to attention residue is not training people to resist switching but reducing the frequency and severity of switches the environment demands. You cannot will yourself out of a cognitive constraint. You can design an environment that respects it.

The fifth feature addresses the organizational level directly: measuring quality of judgment rather than quantity of output. This is the hardest recommendation because it requires new metrics in a domain where the existing metrics are deeply entrenched. Quantity of output — features shipped, documents produced, tickets resolved — is easy to count. Quality of judgment — the accuracy of evaluations, the soundness of architectural decisions, the alignment of strategic choices with intended outcomes — is hard to measure and harder to attribute to specific evaluative events.

But difficulty of measurement is not impossibility. Organizations already measure some proxies for judgment quality: defect rates in shipped code, the frequency of strategic pivots that indicate earlier decisions were wrong, the technical debt ratio that reflects accumulated architectural compromises. These proxies can be correlated with workflow characteristics — the number of context switches per builder per day, the ratio of parallel to sequential project assignment, the duration of uninterrupted focus periods — to build an empirical picture of how workflow structure affects the quality of the judgments the workflow produces.

The correlational evidence will not be as clean as Leroy's experimental findings. Organizational research never is. But it will be suggestive enough to guide design decisions, and the organizations that collect it will have a structural advantage over organizations that continue to optimize solely for output quantity.

There is a final principle that underlies all of these features and deserves explicit statement: the organizational recognition that the human cognitive system is the binding constraint on AI-augmented productivity. Not the speed of the AI. Not the capability of the model. Not the cost of inference or the availability of compute. The binding constraint is the human mind that directs the tool, evaluates its output, and makes the judgment calls that determine whether the output serves its purpose. Everything that degrades that mind's capacity — every unnecessary switch, every unprotected interruption, every recovery-suppressing notification — is a constraint on the organization's actual capability, regardless of what its theoretical capability might be.

This recognition is uncomfortable because it limits the narrative of unlimited productivity that AI tools seem to promise. The tools can produce at unlimited scale. The humans who direct them cannot evaluate at unlimited scale. The bottleneck is not computational. It is biological. And the biological constraint — working memory limits, residue accumulation, recovery requirements — is not addressable through better tools, faster models, or more capable agents. It is addressable only through the design of work environments that treat the human cognitive system with the same engineering rigor that organizations apply to their computational infrastructure.

Segal's argument in The Orange Pill that AI is an amplifier — that it amplifies whatever signal it receives — finds its most precise operational meaning here. The signal the amplifier receives is not the builder's intent, her expertise, or her potential. It is her current cognitive state: the working memory she has available, the executive control resources she can deploy, the emotional regulation capacity she retains at the moment of evaluation. Everything in this book has been directed toward a single practical conclusion: design the work so that the cognitive state at the moment of evaluation is the best the builder can achieve. Minimize the residue. Protect the recovery. Sequence the workflow. Complete before switching. Create the conditions under which the human mind can bring its full resources to the judgment that no machine can make on its behalf.

The amplifier does not care what signal it receives. It amplifies with equal fidelity the judgment of a builder in flow and the judgment of a builder carrying the residue of fifteen context switches. The output in both cases looks competent — the AI's surface quality ensures that. But the subtle differences between output directed by full cognitive resources and output directed by depleted ones compound through the system, accumulate over time, and determine whether the AI-augmented organization is building something excellent or something that merely appears adequate.

The difference is invisible on any given day. It becomes visible over quarters, over years, in the quality of products, the soundness of strategy, the sustainability of the people who do the work. The organizations that design for depth will see it in one direction. The organizations that design for breadth will see it in the other. Leroy's research does not prescribe which outcome an organization should choose. It predicts, with the confidence that controlled experimentation affords, what each choice will cost.

Chapter 9: Flow, Interrupted — and How to Protect It

Mihaly Csikszentmihalyi spent four decades documenting a state of consciousness that most people have experienced and few can reliably produce: the condition in which challenge and skill are matched, attention is fully absorbed, self-consciousness drops away, time distorts, and the person operates at the outer edge of their capability. He called it flow, and his research demonstrated that it is not merely pleasant but optimal — the state in which human beings produce their best work, experience their deepest satisfaction, and operate with the greatest cognitive efficiency their systems can achieve.

Sophie Leroy's research on attention residue is, in a precise and consequential sense, the study of what destroys flow.

This is not how Leroy frames her work. Her experimental paradigm examines task-switching, not flow interruption. Her dependent variables are performance decrements, not phenomenological states. Her language is the careful, operationalized language of organizational psychology, not the experiential language of Csikszentmihalyi's interviews with rock climbers and surgeons and chess players. But the phenomena are connected at the level of cognitive mechanism, and the connection illuminates something that neither research program, taken alone, can fully reveal.

Flow requires sustained, uninterrupted engagement with a single task. This is not one of several conditions for flow. It is the condition without which the other conditions cannot operate. Clear goals, immediate feedback, challenge-skill balance, a sense of control — Csikszentmihalyi's four requirements — are necessary but not sufficient. They create the potential for flow. Sustained engagement actualizes it. Without sustained engagement, the cognitive constellation that flow requires — the densely populated working memory, the finely configured executive control, the deep emotional investment — never fully assembles. The builder remains in the anteroom of flow, experiencing its preconditions without achieving its state.

Attention residue is the specific mechanism through which interrupted engagement prevents the return to flow after a disruption. This is the connection the literature has not sufficiently explored. Gloria Mark's well-known finding — that it takes an average of twenty-three minutes and fifteen seconds to return to a task after an interruption — documents the time cost. Leroy's framework explains the cognitive cost: the residue of the interrupting task persists in working memory, competing with the returning task's demands for the cognitive resources that flow requires. The builder who returns to her project after a monitoring interruption does not simply resume where she left off. She resumes carrying cognitive freight from the interruption — the evaluative judgments she made, the concerns she processed, the decisions she rendered — and that freight prevents the full re-engagement that flow demands.

The re-engagement is not merely delayed. It is degraded. The cognitive constellation that reassembles after an interruption is not identical to the one that was disrupted. Some representations have decayed during the interruption and must be reconstructed. Some associations have been overwritten by the interrupting task's demands and must be re-established. The emotional momentum that sustained the original engagement has been redirected toward the interrupting task and must be regenerated. The reassembled constellation is thinner, less richly connected, less emotionally charged than the one it replaces. The builder is back on task. She is not back in flow.

The distinction matters because the quality of cognitive work performed in flow and the quality performed outside of flow are not marginally different. They are categorically different. Csikszentmihalyi's research documented the difference in subjective experience — the absorption, the satisfaction, the sense of effortless control. But the difference extends to objective performance as well. Studies of programmers, writers, and other knowledge workers consistently find that the quality of output produced during sustained flow episodes is substantially higher than output produced during fragmented work sessions, even when the total time spent is equivalent. The flow-state programmer does not merely write code faster. She writes better code — more elegant, more robust, more attuned to the system's deeper requirements.

If flow produces the highest-quality cognitive work, and if attention residue is the primary mechanism that prevents flow from being achieved or sustained in the AI-augmented workplace, then managing residue is not merely a matter of individual comfort or even individual productivity. It is a matter of whether the organization's most consequential cognitive operations — the evaluations, the judgments, the directional decisions that determine whether AI-generated output serves its purpose — are performed in the cognitive state that produces the best outcomes or the state that produces adequate ones.

AI tools are, paradoxically, among the most powerful flow-enabling technologies ever created. The characteristics that make AI-augmented work so engaging — the tight feedback loops, the immediate realization of intention, the removal of implementation friction that previously separated idea from artifact — are precisely the characteristics that Csikszentmihalyi identified as conditions for flow. The builder working with Claude on a problem she cares about receives immediate feedback (the response arrives in seconds), experiences clear goals (she knows what she is trying to create), faces a challenge matched to her skill (the tool extends her capability without eliminating the need for her judgment), and maintains a sense of control (she directs the collaboration). Every condition for flow is met, and met more completely than in most pre-AI work environments.

The paradox is that the same organizational logic that makes AI tools available to the builder also demands that she use them across multiple projects simultaneously. The tool enables flow. The organization interrupts it. And every interruption, as the preceding chapters have established, generates residue that degrades the builder's capacity to re-enter the flow state the tool enabled.

The builder's subjective experience of this paradox is distinctive and, based on the accounts that proliferated in late 2025 and early 2026, widely shared. She describes sessions of extraordinary creative intensity — hours in which the work flows with a quality and speed she has never previously experienced — punctuated by monitoring demands that shatter the state and leave her struggling to recover it. The sessions between interruptions may still be productive in conventional terms. But the builder knows, with the specific certainty that comes from having experienced both states, that the interrupted work is not the same as the uninterrupted work. Something has been lost. The quality is adequate but not excellent. The judgment is competent but not inspired. The experience is work but not flow.

This experiential report is consistent with Leroy's framework in every particular. The interrupted builder is carrying residue. The residue occupies working memory resources that flow requires. The cognitive constellation is reassembled but diminished. The evaluation she performs is performed with less than her full capacity, and the output she approves is output that her flow-state self might have refined further, redirected, or rejected in favor of something better.

The organizational challenge is that the difference between flow-state output and residue-impaired output is often subtle — subtle enough to pass review, subtle enough to ship, subtle enough to be counted as a success by every metric the organization tracks. The subtlety is what makes the loss invisible and, therefore, what makes it persistent. No alarm sounds when a builder approves output that is good rather than excellent. No metric declines when a judgment is competent rather than inspired. The loss accumulates silently, in the gap between what the organization produced and what it could have produced if the cognitive conditions for its builders' best work had been protected.

Protecting those conditions requires a specific organizational posture that is, at present, rare: the willingness to sacrifice apparent efficiency for actual quality. An organization that assigns builders to one deep project at a time, with protected focus periods and minimal monitoring demands, will appear less efficient than an organization that assigns builders to five parallel projects. The first organization produces less visible output per unit of time. But the output it produces is evaluated in flow or near-flow conditions, by builders whose cognitive resources are intact, whose working memory is dedicated to the single project at hand, and whose judgment is operating at the ceiling their capability allows.

Csikszentmihalyi's research provides the justification for this sacrifice: flow is not merely a pleasant subjective state. It is the condition under which human cognitive systems produce their best work. Leroy's research provides the mechanism: attention residue is the specific, measurable cognitive cost that prevents flow from being achieved or sustained in the typical AI-augmented workflow. Together, the two research programs generate a practical conclusion that is simple to state and difficult to implement: protect the conditions for flow, and the quality of everything the organization produces will improve. Fragment those conditions, and no amount of AI capability can compensate for the degraded human judgment that results.

The practical protections follow from the diagnosis. Batched monitoring — consolidating evaluative tasks into defined periods rather than distributing them across the day — preserves extended windows for flow-conducive work. Completion milestones — defining natural stopping points within projects that provide closure before transitions — reduce the residue generated at each switch. Transition protocols — brief, genuinely disengaging activities between monitoring sessions and creative work — allow the residue from evaluation to clear before the creative mode is required.

These protections are modest in scope and substantial in implication. They do not require new technology. They do not require organizational restructuring. They require only the recognition that the human mind is not a resource to be maximized but a system to be maintained — and that the quality of its maintenance determines the quality of everything it produces.

The builder who is protected from unnecessary interruption, who is given the structural conditions for sustained engagement, who is allowed to enter and maintain the flow state that AI tools are uniquely capable of enabling — that builder produces work of a quality that the residue-laden, multiply-monitored, context-switching alternative cannot match. The difference may not show up on this quarter's productivity dashboard. It will show up in the products the organization ships, the strategies it pursues, the reputation it builds, and the sustainability of the people who do the work.

Flow is not a luxury. It is the condition under which human judgment operates at its best. Residue is not an inconvenience. It is the condition under which human judgment operates at less than its best. The choice between protecting flow and accepting residue is the choice between the organization's actual capability and a diminished version of it — and in the AI-augmented future, where human judgment is the binding constraint on everything, the choice between those two versions may be the most consequential organizational decision there is.

---

Chapter 10: Attention Ecology and the Residue Problem

The final question is not whether attention residue exists. The experimental evidence is robust, replicable, and not in dispute. The question is not whether it matters. The preceding chapters have traced its consequences from the individual cognitive system through organizational networks to the quality of output that AI-augmented workplaces produce. The final question is where this phenomenon fits in the larger picture — what it means for the relationship between human minds and the artificial intelligence systems they are now asked to direct.

Edo Segal proposes, in The Orange Pill, a framework he calls attentional ecology — the study of what AI-saturated environments do to the minds that live inside them. The framework borrows from ecological science not as a metaphor but as a methodological commitment: the recognition that the relationship between an organism and its environment cannot be understood by studying either one in isolation. The organism shapes the environment. The environment shapes the organism. The interaction is continuous, reciprocal, and produces emergent properties that neither party, studied alone, would predict.

Leroy's attention residue is, within this framework, a specific and measurable pollutant. Not a toxin that should be eliminated entirely — some degree of task-switching is inherent in any complex work environment and generates residue that the cognitive system is designed to manage — but a pollutant whose concentration matters. At low concentrations, managed through adequate recovery and sensible workflow design, residue is a background cost of coordinated work. At the concentrations the AI-augmented workplace produces — dozens of context switches per day, monitoring demands that arrive on the agents' schedule rather than the builder's, recovery windows colonized by task seepage — residue becomes an environmental stressor that degrades the cognitive ecosystem it inhabits.

The ecological frame is useful because it identifies the right level of intervention. The individual-level recommendations — take breaks, practice mindfulness, develop focus skills — are the equivalent of telling a person living downstream of a chemical plant to buy a water filter. The filter helps. It does not address the source of contamination. The source, in the case of attention residue, is the structure of work — the organizational decisions about how many projects a builder monitors, how frequently she must switch, how much recovery time the workflow provides, and whether the monitoring demands respect or override her cognitive rhythms.

Structural interventions operate upstream of the individual. They change the concentration of the pollutant in the environment rather than asking the individual to develop tolerance for it. This is ecologically sounder, because the evidence — Leroy's own research and the broader cognitive science it draws on — indicates that tolerance for attention residue cannot be developed. The effect is a fundamental feature of cognitive architecture, not a skill deficit that yields to training. Asking builders to become better at context-switching is like asking a river to become better at flowing uphill. The constraint is architectural. The intervention must be architectural too.

The ecological frame also reveals a dynamic that the individual-level analysis cannot capture: the feedback loop between the environment and the organism's capacity to manage it. A builder operating in an environment with high residue concentration makes judgments of lower quality. Those lower-quality judgments produce outputs that are subtly flawed. The flawed outputs generate downstream problems — bugs, misalignments, strategic drift — that require additional work to address. The additional work generates additional context-switching as the builder is pulled into remediation tasks she had not planned for. The additional switching generates additional residue. The additional residue further degrades judgment quality. The cycle is self-reinforcing: degraded judgment produces flawed outputs that produce additional demands that produce additional degradation.

This feedback loop is the mechanism through which environmental residue concentration, if left unmanaged, escalates. The system does not stabilize at a manageable level of impairment. It tends toward progressive degradation, because each increment of quality debt generates the conditions for the next increment. The trajectory is not visible on any single day. It is visible across quarters, in the metrics that organizations typically attribute to complexity or scale rather than to the cognitive state of their evaluators.

Breaking the feedback loop requires intervening at the point of highest leverage — and Leroy's research identifies that point with specificity. The highest-leverage intervention is not at the end of the cycle, where quality problems manifest and remediation is required. It is at the beginning, where the switching frequency and the recovery structure determine the residue concentration that the builder carries into her evaluations. Reduce the switching frequency. Protect the recovery periods. Sequence the workflow to minimize transitions. Batch the monitoring into defined periods. These interventions, modest in isolation, produce a compound effect: less residue per evaluation, higher judgment quality per output, fewer flawed outputs propagating through the system, fewer remediation demands generating additional switching, and a steady-state residue concentration that the cognitive system can manage without progressive degradation.

The interventions are the dams that The Orange Pill argues every technological transition requires — structures that redirect the flow of capability toward outcomes that serve the people inside the system. The specific dams that attention residue research prescribes are not grand policy initiatives or sweeping cultural reforms. They are workflow design principles, implemented at the team level, maintained through organizational commitment, and evaluated by metrics that track the cognitive state of builders alongside the outputs those builders produce.

The broader connection between Leroy's empirical findings and the cultural diagnosis that Segal and others have offered is this: the AI-augmented workplace is not merely a faster version of the pre-AI workplace. It is a qualitatively different cognitive environment — one that generates more opportunities for deep engagement, more demands for evaluative judgment, more transitions between cognitive modes, and more residue at each transition than anything the pre-AI knowledge worker experienced. The environment has changed. The organism has not. The cognitive architecture that evolved over hundreds of thousands of years to manage a particular range of attentional demands is now operating in an environment that exceeds that range, and the excess manifests as the specific, measurable, consequential degradation that Leroy's research documents.

This is not an argument against AI. The capability expansion is real. The democratization of creative capacity is real. The potential for deeper, more ambitious work — enabled by tools that remove the mechanical friction of implementation and leave the human free to operate at the level of vision, judgment, and direction — is genuine and, in the best cases, transformative. The argument is that realizing this potential requires designing the work environment to protect the cognitive resources on which the potential depends.

The amplifier metaphor that runs through The Orange Pill reaches its most precise operational meaning in the context of attention residue. An amplifier works with whatever signal it receives. It does not filter. It does not select. It does not improve a degraded signal or compensate for noise in the input. It amplifies, with equal fidelity, whatever is fed to it. What the builder feeds the amplifier is not her intent, her expertise, or her potential in the abstract. It is her current cognitive state — the working memory she has available at this moment, the executive control resources she can deploy right now, the emotional regulation capacity she retains after this many switches on this particular day.

If that state is flow — cleared working memory, configured executive control, sustained emotional investment, the full constellation of cognitive resources assembled and deployed — then the amplifier receives and amplifies the best work the builder is capable of producing. The output is directed by judgment operating at its ceiling. The evaluations are sharp. The directions are precise. The results reflect the full value of the human expertise that the tool was designed to leverage.

If that state is residue — fragmented working memory, partially reconfigured executive control, emotional traces of three previous projects competing with the current one for processing resources, the accumulated depletion of a day's worth of context switches — then the amplifier receives and amplifies something less. Not incompetent. Not obviously flawed. But subtly, measurably, consequentially diminished. The output is directed by judgment operating below its ceiling. The evaluations are adequate but not acute. The directions are reasonable but not inspired. The results reflect not the builder's expertise but the portion of that expertise that survived the day's cognitive taxation.

The difference between those two inputs — flow-state signal and residue-state signal — determines, at scale and over time, whether the AI-augmented organization is building something that reflects the full capability of the human minds directing it or something that reflects a systematically diminished version of that capability. The tools do not make this determination. The environment does. And the environment is designed by people — leaders, managers, workflow architects — who have the power to protect or degrade the cognitive resources on which everything depends.

Leroy's research does not prescribe what any organization should build or what any individual should value. It describes a constraint. The constraint is biological, non-negotiable, and indifferent to ambition. The organizations and individuals who design their relationship with AI tools around this constraint — who protect the conditions for sustained engagement, who respect the recovery requirements the cognitive system demands, who measure the quality of attention alongside the quantity of output — will discover that the amplifier produces something worthy of the minds directing it.

The organizations and individuals who ignore the constraint — who treat attention as infinitely deployable, recovery as optional, and context-switching as costless — will discover that the amplifier produces something else: more of something slightly worse, at a scale that makes the slight worsening consequential. The discovery will come slowly, manifesting as drift, as quality debt, as the particular exhaustion that the most ambitious builders report. By the time the discovery is made, the debt will have accumulated, and the cost of repayment will be substantial.

The science is clear. The mechanism is identified. The interventions are specific. What remains is the choice — organizational, cultural, individual — to build the structures that the science demands. Not because the structures are easy. They are not. They require trade-offs between visible productivity and invisible quality, between the seduction of parallelism and the discipline of depth, between the organizational reward for busyness and the cognitive reward for flow. The trade-offs are real, and the short-term incentives favor the wrong side of every one.

But the long-term trajectory favors the right side. The organizations that protect their builders' attention will produce better work, sustain their people longer, and compound quality advantages that residue-laden competitors cannot match. The individuals who protect their own attention — who design their AI interactions for depth rather than breadth, who complete before switching, who guard their recovery against the seepage of one more prompt — will produce work that reflects their actual capability rather than the diminished version that residue permits.

Attention is the resource. Residue is the tax. The tax is non-negotiable. What is negotiable is how much of it you pay — and that negotiation, conducted through the design of workflows, the structure of organizations, and the discipline of individual practice, is the most consequential negotiation of the AI-augmented era. It will be won not by the fastest or the busiest but by those who understand that the quality of what the amplifier produces depends, irreducibly and permanently, on the quality of the mind that feeds it.

---

Epilogue

The number I keep recalculating is five.

Not five engineers, or five projects, or five million dollars. Five context switches. That is how many times, on an average morning of the kind I described in The Orange Pill — the mornings when the work is flowing and the machines are responding and the gap between imagination and artifact has shrunk to the width of a conversation — I pull myself out of one thread to check on another. Five times I shatter whatever constellation I had built in working memory and attempt to rebuild it from fragments.

Before Leroy, I would have called those mornings my most productive. After reading her research, I can no longer use that word with the same confidence. Not because the work was bad. The work was real. Products shipped. Problems got solved. But the question Leroy's framework forces is not whether the work was good. It is whether it was as good as the mind doing it was capable of producing — or whether, by the fifth switch, I was evaluating Claude's output with a cognitive system carrying the sediment of four previous interruptions, each one depositing its layer of unfinished concern onto the working memory I needed for the judgment in front of me.

I cannot answer that question. That is what makes it haunt me.

In Chapter 1 of The Orange Pill, I described standing in a room in Trivandrum watching my engineers discover what AI could do — the twenty-fold multiplier, the boundaries dissolving between disciplines, the vertigo of capability expanding faster than identity could accommodate. What I did not describe, because I did not yet have the vocabulary for it, was what I observed in the weeks after. The engineers who multiplied their output also multiplied their switching. They were building across five domains where they had previously occupied one. They were monitoring AI outputs on frontend, backend, audio processing, computer vision, and conversational AI simultaneously. They were extraordinarily productive. They were also, by Leroy's measure, carrying a residue load that no prior generation of engineers had experienced.

The productivity was real. The question is whether the quality of judgment they brought to their fifteenth evaluation of the day was the same quality they brought to their first. Leroy's data says it was not. And the difference — invisible in any metric we tracked, invisible to the engineers themselves — may have been depositing quality debt into our systems that we are still discovering.

When I wrote about Byung-Chul Han's garden in Berlin — his refusal of the smartphone, his insistence on analog music, his commitment to the resistance of pen on paper — I treated his choices with respect but also with distance. I am not pure enough for Han's world, I wrote. I am too entangled in the systems I critique. That remains true. But Leroy gives me something Han could not: a mechanism. Han diagnosed the pathology of smoothness. Leroy measured the specific cognitive cost that smoothness extracts at each frictionless transition. The diagnosis is philosophical. The measurement is empirical. And the measurement is what lets me act.

I act imperfectly. I still check my agents at lunch. I still carry my phone into conversations where it has no business being. I still feel the pull of one more prompt at hours when the recovery my mind requires is the recovery I am denying it. But I have begun to build different dams. Not Han's dams — I will not abandon the tools. Leroy's dams. Smaller. More specific. Grounded in what the experiments actually show.

I sequence more than I used to. When I sit down with Claude to work on something that matters, I close the other threads. Not all of them — I am not yet that disciplined — but more than before. I notice now, with a precision I did not previously possess, the moment when residue arrives: the slight fuzz at the edge of attention when I return to a problem after checking something else, the half-second of reconstruction before the argument is back in focus, the feeling that the flow I had built is not quite the flow I return to. These are tiny observations. They were invisible to me six months ago. Leroy made them visible, and visibility is the precondition for change.

What I want to say to every parent, every builder, every leader who read The Orange Pill and felt the vertigo of recognition is this: the amplifier is real. Everything I said about it remains true. AI amplifies whatever signal it receives. But the signal is not your ambition or your expertise or your vision for the future. The signal is your cognitive state right now — the working memory you have available at this moment, carrying whatever residue the last hour deposited. That is what the amplifier receives. That is what gets amplified.

Guard that signal. Not because attention is sacred in some abstract philosophical sense, though perhaps it is. Guard it because the quality of what you build — for your company, your students, your children, yourself — depends on the quality of mind you bring to the moment of building. And that quality is not fixed. It is not a trait you possess. It is a state you protect or fail to protect, through the design of your workflow, the discipline of your transitions, and the willingness to let the other threads wait while you give this one the attention it deserves.

Leroy showed us the tax. We cannot abolish it. We can choose how much of it we pay.

-- Edo Segal

You feel productive. The science says your judgment is degraded. The gap between those two facts is where the future of AI-augmented work will be won or lost.
Sophie Leroy discovered something the tec

You feel productive. The science says your judgment is degraded. The gap between those two facts is where the future of AI-augmented work will be won or lost.

Sophie Leroy discovered something the technology industry has yet to reckon with: every time you switch between tasks, your mind carries residue from the last one -- invisible cognitive debris that contaminates your next evaluation, your next decision, your next act of judgment. In an era when AI tools let builders direct five projects simultaneously, Leroy's research reveals the hidden cost of that multiplication. The amplifier is only as good as the mind feeding it, and that mind is carrying more residue than any previous generation of workers has experienced.

This book applies Leroy's framework to the AI revolution with unflinching specificity. It is required reading for anyone who directs AI tools and believes the outputs are as sharp at the end of the day as they were at the start.

“Why is it so hard to do my work?”

— Sophie Leroy