By Edo Segal
The number that broke something open for me was not a technology metric. It was 478 percent.
That is the cost overrun figure that Perplexity returned when Bent Flyvbjerg asked it about the Big Dig — Boston's most infamous infrastructure project. The correct answer, documented in peer-reviewed journals Flyvbjerg himself had published in, was 220 percent. Wrong by more than double. Delivered without a flinch.
I know that moment. I have lived inside it. In Chapter 7 of The Orange Pill, I describe catching Claude attributing a concept to Deleuze that Deleuze never articulated — a passage so rhetorically smooth that I almost kept it. The prose passed every surface check. It failed the only check that mattered: the one against reality. That was my 478 percent.
But here is why Flyvbjerg matters beyond the error. He is not primarily an AI critic. He is the world's leading scholar of why large projects fail — why the Sydney Opera House came in 1,400 percent over budget, why California's high-speed rail may never be completed, why the pattern repeats across decades and continents and political systems without improving. His explanation comes down to two forces: optimism bias (the genuine belief that your project is the exception) and strategic misrepresentation (the deliberate inflation of benefits to get the thing approved). These twin engines have driven institutional self-deception since humans started building things too large for one person to see whole.
Now apply that framework to the AI discourse. The cognitive version — sincere overestimation of what current systems can do — saturates the conversation. The political version — deliberate inflation of capabilities to justify trillion-dollar valuations — fills the earnings calls. Flyvbjerg's diagnostic apparatus, built for bridges and tunnels, turns out to be the sharpest lens available for understanding why the gap between AI hype and AI reality remains so wide and so dangerous.
But the deeper gift is a Greek word he spent thirty years rehabilitating: phronesis. Practical wisdom. The knowledge of what should be done in this particular situation, with these particular stakes, for these particular people. The knowledge that machines do not possess and that our institutions have never been designed to cultivate.
In The Orange Pill, I called it "the remaining twenty percent." Flyvbjerg gives it its proper name and its proper weight. That naming changed how I think about every dam I am trying to build.
This book is another lens. It will sharpen what you saw from the tower.
-- Edo Segal ^ Opus 4.6
Bent Flyvbjerg (born 1952) is a Danish economic geographer and planning scholar widely regarded as the world's leading authority on megaproject management and the study of why large-scale projects fail. Born in Aalborg, Denmark, he earned his PhD from Aarhus University and held positions at Aalborg University and the University of Oxford, where he was the first BT Professor and inaugural Chair of Major Programme Management at the Saïd Business School. He is currently the Villum Kann Rasmussen Professor at the IT University of Copenhagen. His landmark 1998 work Rationality and Power: Democracy in Practice — based on a fifteen-year case study of urban planning in Aalborg — established him as the foremost advocate for phronetic social science, an approach grounded in Aristotle's concept of practical wisdom (phronesis) as the highest form of knowledge about human affairs. His subsequent books Megaprojects and Risk (2003) and How Big Things Get Done (2023, with Dan Gardner) brought his empirical findings on cost overruns, optimism bias, and strategic misrepresentation to a broad global audience. His methodology of reference class forecasting — using the statistical distribution of comparable past projects to correct planning estimates — was adopted as official policy by the UK government and has influenced infrastructure planning worldwide. His 2025 paper "AI as Artificial Ignorance" extended his framework to the age of artificial intelligence, arguing that large language models are structurally incapable of truth-tracking and function as what philosopher Harry Frankfurt defined as bullshit machines — systems optimized for persuasion rather than accuracy. Flyvbjerg's work stands as a sustained argument that context-dependent practical wisdom, not abstract universal knowledge, is the form of understanding most essential to consequential human decision-making.
In January 2025, Bent Flyvbjerg — the Oxford professor who had spent three decades studying why large-scale projects fail — sat down with two of the most celebrated artificial intelligence systems on the planet and asked them a simple question. He wanted to know the cost overrun of Boston's Central Artery/Tunnel Project, commonly known as the Big Dig, one of the most extensively documented megaprojects in American history.
Flyvbjerg knew the answer. He had published it in peer-reviewed journals. The number — 220 percent — had been cited hundreds of times across the academic literature on infrastructure failure. It was not obscure. It was not contested. It was the kind of factual claim that a competent research assistant could verify in minutes using any academic database.
ChatGPT got it wrong. Perplexity got it worse — returning 478 percent, a figure that bore no relationship to any published source. Neither system flagged uncertainty. Neither hedged. Both delivered their answers with the smooth, confident prose that large language models have been engineered to produce: grammatically impeccable, rhetorically persuasive, and factually worthless.
Flyvbjerg published the results in Project Leadership and Society under a title that landed like a slap: "AI as Artificial Ignorance." The paper, which accumulated over four thousand downloads within months of its release, did not merely document an error. It diagnosed a condition. The condition was this: artificial intelligence systems, as currently designed, lack any concept of truth. They predict the most likely next word or phrase, which is not a reliable basis for decision-making. They mix true, false, and ambiguous statements in ways that make it difficult to distinguish which is which. The systems are not intelligent in any meaningful sense. They are persuasive. And persuasion without truth is what the philosopher Harry Frankfurt defined, with surgical precision, as bullshit.
The word requires defense in a scholarly context, and Flyvbjerg provided it. Frankfurt's 1986 essay "On Bullshit" — later expanded into a bestselling book — drew a distinction between lying and bullshitting that is essential to understanding what large language models actually do. A liar knows the truth and deliberately contradicts it. The liar operates within a framework in which truth exists and matters; the lie is parasitic on the truth it denies. A bullshitter, by contrast, is indifferent to truth. The bullshitter does not deny reality. The bullshitter ignores it. The goal is not to describe the world accurately but to produce an effect on the audience — to persuade, to impress, to sound authoritative — without regard for whether the statements that produce the effect happen to be true.
Large language models, Flyvbjerg argued, are bullshit machines in precisely Frankfurt's sense. They do not lie, because lying requires knowledge of what is true. They do not tell the truth, because truth-telling also requires such knowledge. They operate in a space orthogonal to the truth-falsehood axis entirely. They generate text that is optimized for plausibility, for the appearance of knowledge, for the rhetorical texture of authority — and this optimization proceeds without any mechanism for determining whether the generated text corresponds to reality. The system sounds convincing even when it is wrong. As such, current AI is more about persuasion than about truth.
This argument would be merely clever if Flyvbjerg were simply another technology critic scoring rhetorical points against Silicon Valley. He is not. He is a scholar whose entire career has been organized around a single, devastating empirical finding: that the people and institutions responsible for the world's largest, most consequential projects systematically lie to themselves and to the public about what those projects will cost, how long they will take, and what benefits they will deliver. His database of megaproject performance, built over decades, covers hundreds of projects across dozens of countries and every major infrastructure category. The finding is monotonous in its consistency. Cost overruns average. Benefits fall short. Timelines stretch. And the pattern does not improve over time, because the pattern is not produced by ignorance or incompetence. It is produced by optimism bias — the genuinely held belief that your project is different, that the base rate does not apply to you, that the rules governing every comparable case will somehow be suspended for yours — and by strategic misrepresentation, the deliberate inflation of benefits and deflation of costs to secure approval for projects that would be rejected if their true parameters were known.
Optimism bias and strategic misrepresentation are the twin engines of project failure, and they are also the twin engines of the current AI discourse. The cognitive version — genuine overestimation of what current AI systems can do — fills the conversation of engineers and researchers who are close enough to the technology to be awed by its capabilities and too close to maintain perspective on its limitations. The political version — deliberate inflation of AI claims to secure investment, market share, and regulatory latitude — fills the earnings calls and keynote addresses and breathless press releases of companies whose valuations depend on the narrative that artificial general intelligence is imminent. Flyvbjerg's entire intellectual apparatus, built to explain why the Sydney Opera House came in 1,400 percent over budget and why California's high-speed rail project may never be completed, applies to the AI industry with an explanatory power that suggests the pattern is not specific to construction or transportation but endemic to any domain where large claims are made about future capability under conditions of radical uncertainty.
The Big Dig test was not an isolated experiment. Flyvbjerg recounted a personal encounter with AI that carried, beneath its humor, a diagnostic precision about the technology's current limitations. He had used an AI-powered paint-color matching application — the kind of consumer tool that AI companies hold up as evidence of the technology's everyday utility — and found that it failed at the basic task it was designed to perform. The recommended color bore no resemblance to the target. Artificial intelligence turned out to be a real waste of time and money in this case. The anecdote is small, almost trivial, but Flyvbjerg deploys it the way a diagnostician deploys a symptom: not because the symptom is the disease, but because the symptom reveals the underlying pathology. If the system cannot match a paint color — a task with an objectively verifiable correct answer — then the confident authority with which it pronounces on complex, ambiguous, high-stakes questions should provoke not admiration but alarm.
Geoffrey Hinton, the Nobel laureate whose work on neural networks laid the foundation for the current generation of AI systems, reached a compatible conclusion from the opposite direction. One of the greatest risks of AI, Hinton warned, is not that chatbots will become super-intelligent, but that they will generate text that is super-persuasive without being intelligent. The warning is precise. The danger is not capability. The danger is the gap between perceived capability and actual capability — the gap that Flyvbjerg has spent his career measuring in other domains and that, he argues, is wider in AI than in any technology he has previously studied.
Cambridge professor Alan Blackwell, whom Flyvbjerg cites approvingly, does not hesitate to call ChatGPT a bullshit generator, using Frankfurt's definition with the exactitude it deserves. Mercedes' chief technology officer, Markus Schäfer, articulated the industrial consequence: if you sit in a car and ChatGPT tells you something that is absolute nonsense, you might be exposed to product liability cases. The automotive industry, which has skin in the game in ways that the technology press does not, has proceeded with a caution that the broader culture has not matched.
The deeper argument in "AI as Artificial Ignorance" concerns the confusion between two fundamentally different concepts that the AI industry has a financial interest in conflating. Artificial general intelligence — a hypothetical system that would possess the kind of flexible, context-sensitive, truth-tracking reasoning that humans exercise — does not exist and may not exist for a very long time, if ever. Generative artificial intelligence — a system that produces plausible-sounding text, images, and code by predicting patterns in training data — exists now and is commercially available. The AI industry's trillion-dollar valuations depend on the public, the investment community, and the regulatory apparatus mistaking the second for the first, or at minimum believing that the second is a reliable stepping stone to the first. Flyvbjerg's paper documents a profound gap between the hype and the reality of AI and explains the gap in terms of this confusion.
This confusion is not innocent. It is the structural equivalent of the strategic misrepresentation that Flyvbjerg has documented in megaproject planning for decades. When a government agency presents an infrastructure project with an estimated cost of two billion dollars, knowing that comparable projects have averaged cost overruns of fifty percent, the agency is engaging in strategic misrepresentation — presenting a number it has reason to believe is wrong because the correct number would prevent the project from being approved. When an AI company presents its large language model as "intelligent," knowing that the model lacks any mechanism for distinguishing true statements from false ones, the company is engaging in the same structural behavior. The misrepresentation may not be deliberate in every case; some of it is genuine optimism bias, the sincere belief that this system, unlike every previous system that failed to achieve general intelligence, really is different. But the effect is the same: resources are committed, expectations are set, and decisions are made on the basis of claims that the available evidence does not support.
Flyvbjerg's conclusion is not that AI is useless. It is more precise and more damning. AI may become intelligent in the future; it is not intelligent now. The current systems are useful in domains where the user already possesses the expertise to evaluate the output — where, in other words, the system is least needed. The key point is that AI should be used in areas where we are highly knowledgeable. This conclusion aligns with the experimental finding of Nassim Nicholas Taleb, whose own tests of ChatGPT led to the same result: AI is only useful if the user already knows the subject well. The tool amplifies existing knowledge. It does not generate new knowledge. And when it produces output in domains where the user lacks expertise, it generates confident nonsense that the user is poorly equipped to detect.
Our biggest risk is, as usual, ourselves — the human tendency to trust a system that sounds authoritative, that produces well-formatted prose, that delivers its errors with the same unflappable confidence with which it delivers its truths. The real danger of current AI is not in its limitations, which are real but bounded. The real danger is that humans begin to trust an AI that is in fact ignorant and faulty, which could prove disastrous. The systems were designed to be persuasive. Persuasiveness without truthfulness is the most dangerous combination available, and it is the combination that the current generation of AI systems has optimized for.
The argument, published in an academic journal and circulated through working papers and social media, earned Flyvbjerg a position in the AI discourse that is unusual for a scholar whose primary domain is infrastructure planning. He was tagged on LinkedIn urging readers to stop calling it Artificial Intelligence and start calling it Artificial Ignorance. The reframe was not merely rhetorical. It was diagnostic. The name we give a technology shapes the expectations we bring to it, the trust we invest in it, and the governance structures we build around it. Calling a system "intelligent" that cannot distinguish a 220 percent cost overrun from a 478 percent one is not optimism. It is the kind of categorical error that, in Flyvbjerg's empirical record, precedes every major project failure in the modern era.
The question that "AI as Artificial Ignorance" leaves open — deliberately, strategically, with the precision of a scholar who knows the difference between what the evidence supports and what it does not — is whether the condition is temporary or permanent. Current AI, based on large language models, entails artificial ignorance more than artificial intelligence. That needs to change for AI to become a trusted and effective tool in science, technology, policy, and management. AI needs criteria for what truth is and what gets to count as truth. The statement is both a diagnosis and a prescription. The diagnosis is that current systems are structurally incapable of truth-tracking. The prescription is that future systems must develop this capability if they are to deserve the name intelligence.
Whether they will is the question that this book, drawing on the full range of Flyvbjerg's intellectual framework, attempts to answer. The answer requires moving beyond the empirical critique of current AI systems — valuable as that critique is — to the deeper philosophical foundation on which Flyvbjerg's entire career rests: the Aristotelian distinction between the knowledge that machines can possess and the knowledge that, by its very nature, resists mechanization. That distinction begins with a Greek word that Flyvbjerg has spent thirty years rehabilitating from philosophical obscurity into the central concept of his life's work.
The word is phronesis. And the argument it unlocks about what AI can and cannot do is more radical, more precise, and more consequential than anything in the current discourse.
---
Aristotle's Nicomachean Ethics, Book VI, contains a taxonomy of intellectual virtues that has been debated, mistranslated, selectively ignored, and occasionally weaponized for twenty-three centuries. The taxonomy distinguishes three forms of human knowing, and the distinction is not academic in the pejorative sense. It is the most practically consequential classification in Western intellectual history, because it determines what a civilization values, what it measures, what it teaches its children, and what it considers worthy of the name knowledge at all.
The first intellectual virtue is episteme — scientific knowledge. This is the knowledge of things that cannot be otherwise, as Aristotle put it: universal truths, necessary regularities, the deep structures of reality that hold regardless of who observes them and under what circumstances. The law of gravity does not vary by jurisdiction. The chemical composition of water does not change when the chemist crosses a border. The speed of light in a vacuum is the same whether measured by a physicist in Geneva or a student in Trivandrum. Episteme is context-independent. It can be stated in propositions. It can be tested, replicated, transmitted without loss of content from one mind to another. It is the knowledge that the natural sciences were built to produce, and the prestige it carries in the modern intellectual economy cannot be overstated.
Since the Scientific Revolution, and with renewed force since the Enlightenment, episteme has functioned as the gold standard against which all other forms of knowing are measured and found wanting. The social sciences — sociology, economics, political science, management studies — were born in the image of the natural sciences, aspiring to discover universal laws of human behavior that would hold with the same reliability as the laws of physics. Auguste Comte modeled sociology explicitly on physics. Economics aspired to the predictive power of Newtonian mechanics. The aspiration persists in every randomized controlled trial, every regression analysis, every attempt to extract from the noise of particular human situations the signal of universal regularity.
The second intellectual virtue is techne — craft knowledge, the knowledge of how to make things. The carpenter who knows how to join wood possesses techne. The surgeon who knows how to suture a wound possesses techne. The software engineer who writes a function that compiles and executes correctly possesses techne. Techne is productive knowledge — knowledge oriented toward the creation of an artifact or outcome through the systematic application of learned skill. Unlike episteme, techne concerns the variable world, the domain of things that could be otherwise, where the practitioner must exercise skill to achieve a desired result. But like episteme, techne is rule-governed. It can be codified. It can be taught through instruction and practice. A master carpenter can explain the principles of joinery to an apprentice, and the apprentice can acquire the skill through repetition until it becomes second nature.
The modern technology industry is, in its self-understanding, a techne enterprise. It builds things. It values the people who build things most skillfully. It measures success in terms of artifacts produced — code shipped, features deployed, products launched, systems scaled. The hierarchy of value runs from those who can build to those who can merely describe what should be built. The builder commands the highest premium. The person with ideas but no implementation skill occupies a lower status — a "product manager," a "business person," a talker in a culture that valorizes doing.
The third intellectual virtue is phronesis, and it is here that Aristotle's classification becomes most consequential and most resistant to modern assimilation. Phronesis is practical wisdom — the knowledge of how to act well in particular situations where "well" involves values, power, context, and judgment that cannot be reduced to rules. The phronimos, the person of practical wisdom, does not simply know what is true (episteme) or what can be made (techne). The phronimos knows what should be done in this particular situation, with these particular people, under these particular constraints, given these particular values at stake.
Phronesis is context-dependent in a way that episteme and techne are not. A rule of physics applies everywhere. A technique of carpentry can be taught to anyone with hands. But the practical wisdom to know whether this bridge should be built at all — whether the cost is justified by the benefit, whether the community it serves will be strengthened or fractured by its construction, whether the environmental assessment has been honestly conducted or optimistically massaged — depends on the particular circumstances of the particular case. It cannot be derived from universal principles alone. It cannot be specified in an algorithm. It cannot be transmitted as a set of instructions from one mind to another without catastrophic loss of content, because the content is inseparable from the context in which it was developed and the biography of the person who developed it.
Bent Flyvbjerg has spent his career rehabilitating phronesis from its marginal position in modern social science to what he argues is its rightful place at the center. His 2001 book Making Social Science Matter mounted a sustained argument that the social sciences' century-long attempt to produce episteme about human affairs — to discover universal laws of social behavior — has failed, and that the failure is not contingent but structural. Human affairs are constitutively context-dependent. The phenomena that social scientists study — power, values, organizational behavior, institutional dynamics — do not exhibit the context-independent regularities that would make epistemic knowledge possible. A management technique that produces transformation in one organization produces dysfunction in another. A policy that reduces poverty in one country deepens it across the border. A pedagogical approach that liberates one classroom deadens the next. The context is not noise to be controlled for. The context is the phenomenon.
The implication is radical and has been resisted accordingly. If social phenomena are constitutively context-dependent, then the dominant methodology of modern social science — designed to produce context-independent, generalizable, law-like findings — is structurally incapable of producing the most important form of knowledge about its own subject matter. The social sciences have been measuring the wrong thing with the wrong instruments, not because the researchers are incompetent but because the intellectual tradition they inherited from the natural sciences is misconceived when applied to human affairs. The instruments produce findings that are universally true and practically useless — statistically significant regularities that hold across populations but tell you nothing about what to do in the particular situation you actually face.
This framework — developed long before the current AI moment, refined through decades of empirical research into why projects fail and institutions deceive themselves — turns out to be the most precise diagnostic instrument available for understanding what artificial intelligence can and cannot do. The convergence is not accidental. It is structural. AI systems, as currently designed, are episteme-and-techne machines of extraordinary power. They process universal knowledge — the statistical regularities extracted from vast training corpora — with a speed and accuracy that exceeds any human capacity. They produce technical artifacts — code, text, images, analyses — with a facility that makes the most skilled human practitioner seem glacial. The large language model is the apotheosis of the epistemic-technical aspiration: a system that knows everything that can be stated in propositions and makes everything that can be specified in instructions.
What the system cannot do is exercise phronesis. It cannot know what should be done in this particular situation, with these particular values, for these particular people, under these particular conditions of uncertainty. It cannot weigh competing goods when the competition is not between quantities but between incommensurable values — when the question is not "which option produces more?" but "which option serves justice?" or "which option respects the dignity of the people affected?" or "which option am I willing to be held responsible for if it goes wrong in ways I cannot predict?" These questions require the kind of situated, value-laden, contextually embedded judgment that phronesis names, and they are the questions that matter most in every domain where AI is being deployed.
The engineer whom Edo Segal describes in The Orange Pill — the one who discovered that the "remaining twenty percent" of his work, the judgment about what to build and whether to build it, was everything — discovered phronesis without naming it. For decades, this engineer's practical wisdom had been masked by his techne. The eighty percent of his work that consisted of implementation — writing code, debugging errors, managing dependencies — consumed so much bandwidth that his phronetic contribution was invisible even to himself. The system valued the code he shipped. It did not recognize, reward, or even identify the judgment that determined what code should be shipped.
When AI stripped away the techne layer, the phronesis was exposed. The discovery was simultaneously a liberation and a crisis. A liberation because the engineer could now see his actual contribution clearly for the first time. A crisis because his entire career — his training, his hiring, his evaluations, his professional identity — had been organized around the techne that the machine had just rendered abundant. He had been rewarded for the wrong thing for twenty years. The system that trained him, employed him, and evaluated him had never told him the difference between techne and phronesis, because the system itself did not recognize the difference.
This systemic blindness is not a local failure of the technology industry. It is the consequence of a civilization that has organized its intellectual economy around the epistemic-technical virtues and allowed the phronetic virtue to atrophy through neglect. The educational pipeline — from computer science curricula to coding bootcamps to technical interviews to performance reviews — is organized to produce and evaluate techne. Students learn to write code. Candidates are tested on algorithmic puzzles. Employees are evaluated on features shipped and bugs fixed. At no point in the pipeline is phronesis explicitly identified as a capacity, cultivated as a skill, or assessed as a performance metric.
The engineer who develops excellent practical judgment does so incidentally, as a byproduct of years of experience, not as the result of deliberate training. The AI transition has exposed this systemic failure with brutal clarity. When the tools automate the techne, the practitioners who thrive are the ones who happened to develop phronesis along the way. Their development was not planned. Their capacity was not trained. Their value was not recognized until the machine made it impossible to ignore.
The AI discourse, as presently conducted, reproduces the civilizational bias toward episteme and techne with striking fidelity. The dominant questions — How fast can AI produce code? How many tasks can it complete? How much productivity does it add? — are epistemic-technical questions. They measure the machine's performance in the domains where the machine excels. They do not ask the phronetic questions: Is the code worth writing? Are the tasks worth completing? Does the productivity serve the values it should serve? Is the judgment being exercised in the deployment of these tools adequate to the consequences they produce?
The omission is not neutral. The questions a society asks determine the answers it receives, and the answers it receives determine the institutions it builds. A society that asks only epistemic-technical questions about AI will build institutions optimized for epistemic-technical performance — institutions that produce more, faster, across wider domains, with greater efficiency. Whether those institutions produce better outcomes for the people they serve is a phronetic question, and it will go unasked as long as the intellectual framework within which AI is understood has no place for phronesis.
Aristotle argued that phronesis is the architectonic virtue — the virtue that governs the exercise of all other virtues, that determines when and how and for what purposes the other forms of knowledge should be deployed. Episteme tells you what is true. Techne tells you what can be made. Only phronesis tells you what should be done with the truth and the making. Without phronesis, the other virtues are blind — powerful but directionless, capable but ungoverned.
This Aristotelian claim, formulated in the fourth century BCE, turns out to be the most precise description available of the AI transition's central challenge. The machines have episteme and techne in abundance. What governs their deployment is the phronesis of the humans who direct them. And the capacity of those humans for phronesis is the variable that will determine whether the most powerful epistemic-technical tools in human history serve human flourishing or merely accelerate the patterns of overconfidence, misrepresentation, and institutional self-deception that Flyvbjerg has spent his career documenting.
The three forms of knowing are not equal. They never were. But the hierarchy has been inverted for centuries — episteme at the top, techne in the service of episteme, phronesis forgotten at the bottom — and the inversion has never been more dangerous than it is now. The machines have matched us in episteme and techne. What remains is the thing we neglected. The question is whether we can recover it before the consequences of the neglect become irreversible.
---
The most rigorous empirical study of AI's impact on work — conducted by Xingqi Maggie Ye and Aruna Ranganathan of UC Berkeley's Haas School of Business, published in the Harvard Business Review in February 2026 — embedded researchers in a two-hundred-person technology company for eight months. The methodology was careful. The observation was sustained. The findings were specific, replicable, and genuinely important: AI tools intensify work rather than reducing it; work seeps into pauses; multitasking becomes the norm; attention fractures; boundaries between roles dissolve.
These findings confirm a real phenomenon. They are also, from the perspective of Flyvbjerg's phronetic social science, a precise demonstration of how the dominant research paradigm captures what is least important about the AI transition while remaining structurally blind to what matters most.
The methodological problem is not sloppiness. The Berkeley researchers were meticulous. The problem is categorical. The instruments that modern social science has developed — surveys, behavioral observation, quantitative coding of work activities, self-report measures of satisfaction and burnout — are instruments designed to produce episteme. They extract from particular situations the universal regularities that hold across cases. They measure what can be measured in context-independent terms: hours worked, tasks completed, boundaries crossed, satisfaction reported on standardized scales.
The phronetic question — whether the work done in those hours was wiser, whether the tasks completed deserved to be completed, whether the boundary crossings represented genuine judgment or mere scope creep, whether the satisfaction reported reflected the deep engagement of flow or the shallow buzz of productive addiction — is invisible to these instruments. It is invisible not because the researchers failed to ask but because the instruments they inherited from a century of epistemic social science cannot ask. The question of quality, of judgment, of whether the output serves the values it should serve, is constitutively context-dependent. It requires the researcher to enter the world of the practitioner, to understand the specific situation with its specific stakes, to assess the quality of practical judgment exercised in conditions of genuine ambiguity. This kind of assessment is what phronetic social science was designed to produce, and it is precisely what the dominant paradigm has excluded from its methodological repertoire.
The consequence is a measurement crisis that extends far beyond academic methodology. Policymakers, organizational leaders, and educators who must make decisions about AI deployment rely on the available evidence. The available evidence measures behavior — hours, output, efficiency, reported experience. It does not measure the quality of judgment, the development of practical wisdom, or the preservation of the formative conditions for embodied expertise. The decisions made on the basis of this evidence will address the symptoms the evidence can detect — burnout, work intensification, attention fragmentation — while remaining blind to the underlying condition: whether AI-augmented work is producing practitioners who judge more wisely or merely execute more rapidly.
Flyvbjerg has been making this argument about social science methodology for decades, and the argument has never been more urgent. The dominant instruments are precisely wrong: precise in their measurement of the things that matter least, and entirely silent on the things that matter most. A survey cannot measure the quality of judgment. A controlled experiment cannot capture the context-dependence of practical wisdom. A statistical analysis cannot distinguish between output that is technically correct and output that is practically wise. The instruments measure what they were designed to measure — behavioral regularities that persist across contexts — and they are constitutively blind to the phronetic dimension of human work that the AI transition has made the scarce and decisive resource.
Consider what the Berkeley study found and what it could not find. The study found that workers who adopted AI tools worked more hours and took on more tasks. This is a genuine finding. But it cannot tell us whether the additional hours were spent on judgment-intensive work that developed the practitioners' capacity for practical wisdom or on low-stakes task-filling that consumed time without building capability. Both show up identically in the data as "more work." The study found that work seeped into pauses — lunch breaks, elevator rides, gaps between meetings. This too is a genuine finding. But it cannot tell us whether the practitioners who worked through their pauses were exercising phronesis in compressed moments of genuine engagement or were simply unable to resist the tool's availability, driven by the internalized imperative that Byung-Chul Han calls auto-exploitation. From the outside, the behavioral signature is identical. Only the phronetic assessment — which requires entering the practitioner's world and understanding the specific context and quality of the engagement — can distinguish between the two.
The study found that attention fractured as practitioners juggled multiple AI-assisted tasks simultaneously. This is perhaps the most important behavioral finding, because it points toward a phenomenon that phronetic analysis can illuminate in ways that behavioral observation cannot. Sustained attention is the precondition for phronesis. Practical wisdom develops through deep engagement with particular situations over extended periods — through the accumulation of experience that comes only from staying with a problem long enough for its deeper structures to become visible. When attention fractures into parallel streams, the depth of engagement that phronesis requires is structurally impossible. The practitioner who monitors three AI processes simultaneously while drafting a specification and reviewing a colleague's output may be technically productive, but the conditions for the development of practical judgment have been destroyed.
The Berkeley researchers, to their credit, sensed the gap in their own findings. They proposed what they called "AI Practice" — structured pauses, sequenced work, protected time for unaugmented human interaction. The proposal addresses the symptoms their instruments could detect. It does not address the underlying condition, because the condition — the erosion or cultivation of phronetic capacity — requires measurement instruments that do not yet exist in the mainstream social science toolkit.
Flyvbjerg's phronetic social science offers an alternative methodology, one he has practiced in his own research on megaprojects and planning failures. The methodology is case-based: it studies particular situations in sufficient depth to reveal the mechanisms through which outcomes are produced. It is longitudinal: it follows phenomena over time horizons adequate to the processes being studied. It is context-sensitive: it treats context not as a confounding variable to be controlled for but as the essential dimension of the phenomenon. And it is value-laden: it explicitly engages with questions of what constitutes good practice, rather than treating "more output" as a neutral proxy for improvement.
Applied to the AI transition, a phronetic methodology would look radically different from the current research paradigm. It would follow individual practitioners over years, not months, tracking changes in the quality of their judgment through scenario-based assessments that present them with ambiguous, value-laden situations and evaluate the wisdom of their responses. It would embed researchers in specific organizations for long enough to understand the contextual conditions — team dynamics, leadership quality, organizational culture, the specific configuration of AI tools and human practices — that determine whether AI augmentation cultivates or erodes practical wisdom. It would take practitioners' own retrospective accounts seriously as data about the phronetic dimension of their experience, rather than treating self-report as a secondary source inferior to behavioral observation.
This kind of research is expensive, time-consuming, and incompatible with the publication timelines that academic incentive structures reward. Longitudinal case studies do not produce the clean, generalizable findings that high-impact journals favor. They do not lend themselves to the statistical significance tests that funding agencies require. They resist the replication protocols that have become the methodological gold standard since the social sciences' replication crisis. They are, by the standards of the dominant paradigm, methodologically inferior.
Flyvbjerg's career-long argument is that this judgment of inferiority is itself the problem. The dominant paradigm has systematically marginalized the form of research best suited to the most important questions, because the form does not fit the template. The consequence is an evidence base that is extensive in its coverage of the measurable and entirely absent on the consequential. The AI transition is being studied with instruments designed to answer questions that do not matter, while the questions that do matter — questions about judgment, wisdom, the quality of human decision-making under conditions of radical technological change — go unasked because the instruments capable of asking them have been excluded from the methodological canon.
The political dimension of this exclusion deserves attention. The definition of "productivity" that dominates the AI discourse — output per unit of input — is not neutral. It is a definition that serves the interests of those who commission productivity measurements: managers, shareholders, organizational leaders who need numerical indicators of return on investment. A phronetic definition of productivity would ask a different question: Are the people doing this work making better decisions for the people affected by their work? This question cannot be answered by counting output. It can only be answered by the kind of sustained, contextual, judgment-rich inquiry that phronetic social science provides, and the exclusion of this inquiry from the dominant research paradigm ensures that the question is never asked in the forums where deployment decisions are made.
Flyvbjerg has argued throughout his career that the failure of rationalistic social science is not merely intellectual but political. The forms of knowledge that are excluded from the dominant paradigm are not random. They are the forms of knowledge that would make power visible — that would reveal who benefits from current arrangements, whose interests are served by the dominant definitions of value, and whose voices are excluded from the conversations that determine how resources are allocated and technologies are deployed. Phronesis is excluded not because it is methodologically inferior but because it is politically inconvenient. It asks questions that the dominant framework does not want asked.
In the AI transition, the political stakes of this exclusion are higher than in any previous domain Flyvbjerg has studied. When organizations define productivity as output per input and optimize accordingly, they create environments in which practitioners produce more without understanding more, execute faster without judging more wisely, and build more without asking whether what they are building deserves to exist. The measurement framework determines the optimization target, and the optimization target determines the outcome. A framework that measures only episteme and techne will optimize only for episteme and techne. Phronesis — the knowledge that governs whether the other forms of knowledge are used wisely — will be systematically neglected, not through malice but through the structural incapacity of the instruments to see it.
The methodological crisis is solvable. Flyvbjerg has spent his career developing the alternative. But solving it requires something that the social science establishment has resisted for generations: the recognition that context-dependent knowledge is not inferior to context-independent knowledge. It is different. And for the purposes of understanding what AI is doing to human capability, it is more valuable. The most important findings about the AI transition will come not from large-sample quantitative studies with impressive statistical power but from detailed, longitudinal case studies that follow particular practitioners in particular contexts over extended periods, tracking the evolution of their judgment with the patience and precision that the phenomenon demands.
The instruments exist. The methodology exists. What does not yet exist is the institutional will to deploy them at scale — to fund the research, to publish the findings, to build the evidence base that policymakers and organizational leaders need to make wise decisions about the most consequential technological transition in modern history. The gap between the urgency of the questions and the adequacy of the instruments being used to answer them is the methodological crisis of the age of AI. Flyvbjerg saw it coming, because the crisis is the same one he diagnosed decades ago. Only the stakes have changed.
---
In The Orange Pill, Edo Segal describes an engineer in Trivandrum, India, whose daily work included approximately four hours of what she called "plumbing" — dependency management, configuration files, the mechanical connective tissue between the software components she was actually designing. The work was tedious. It consumed cognitive resources she would have preferred to allocate to more challenging problems. By any standard metric, it was low-value labor — precisely the kind of routine, specifiable task that AI automation targets.
Claude took over the plumbing. The engineer's productivity improved. Her output quality was maintained or enhanced. She was freed to work on more interesting problems. Every available measurement indicated that the AI intervention was an unqualified success.
Months later, she realized that something was wrong. She was making architectural decisions — the high-level judgments about system design that constitute the most consequential part of a senior engineer's work — with less confidence than she used to possess. She could not explain why. The decline was not dramatic. It was not a visible failure. It was a slow, incremental erosion of something she had not known she possessed until it began to disappear.
The explanation, which she pieced together retrospectively, concerned ten minutes. Embedded within those four hours of daily tedium were approximately ten minutes — scattered unpredictably, impossible to schedule or isolate — of a qualitatively different kind of experience. A dependency conflict that forced her to trace a connection between subsystems she had not previously understood to be connected. A configuration error that required her to understand why a default setting existed. A build failure that revealed a structural assumption in the system's architecture that she had never examined.
These moments were rare. Perhaps a four percent incidence rate across the four hours. But their developmental significance was vastly disproportionate to their duration. They were the moments that built her architectural intuition — the embodied, context-dependent, non-propositional knowledge that allowed her to feel that something was wrong with a system before she could articulate what was wrong. This knowledge is not episteme. It cannot be stated in propositions. It is not techne. It is not a procedure that can be transmitted through instruction. It is phronesis — practical wisdom developed through sustained engagement with a domain, accumulated across thousands of small encounters with the unexpected, deposited in layers so thin that no single layer is perceptible but the accumulated deposit forms the foundation on which expert judgment stands.
When Claude automated the plumbing, it removed the tedium and the ten minutes together, because it could not distinguish between them. From the system's perspective, the entire four hours was plumbing — a category of work defined by its functional characteristics, not by the developmental significance of its rare, unpredictable by-products. The AI saw tasks. Phronetic analysis sees the formative process hidden inside the tasks.
This case, which Flyvbjerg's framework illuminates with a specificity that no other available conceptual apparatus can match, is not an anecdote to be quoted and then filed alongside other data points. It is the central phenomenon of the AI transition — the mechanism through which automation simultaneously improves performance on every measurable dimension and erodes the capacity for the judgment that determines whether performance is meaningful.
Three specific obstacles prevent the dominant research paradigm from detecting this mechanism, and each obstacle is structural rather than incidental.
The first obstacle is temporal. The erosion of phronetic capacity operates over years, not months. The engineer did not notice the loss for months after it occurred, and even then the realization was retrospective — a gradual awareness that decisions she used to make with confidence now felt uncertain, without any clear moment at which the change occurred. A study that follows practitioners for eight months, which exceeds the duration of most AI impact research, will not detect this erosion, because eight months is insufficient for the slow attrition of embodied judgment to become visible. The loss manifests only when the practitioner encounters a situation that demands the intuition she no longer possesses — a novel system design problem, an architectural decision without precedent, a failure mode that resembles no documented pattern — and discovers that the foundation she used to stand on has been quietly eroding since the AI took over the work that built it.
Phronetic research demands time horizons adequate to the processes being studied. For the development of practical wisdom, this means years. The ten-year longitudinal study of AI-augmented practitioners — following specific engineers, lawyers, physicians, and designers through the evolution of their professional judgment — does not yet exist. Its absence is the most significant gap in the current evidence base.
The second obstacle is invisibility. The loss is invisible to the person experiencing it. The engineer reported improvement because improvement was what she experienced: less tedium, more interesting work, higher output, expanded scope. The loss was not experienced as a loss. It was experienced as liberation. The awareness that something had been subtracted arrived only when the subtracted capacity was needed and found missing, which is to say, after the loss had already compounded.
Self-report measures, the primary instrument for assessing subjective experience in AI impact research, are structurally incapable of detecting a loss that the subject does not know has occurred. The subject reports what she experiences, and what she experiences is improvement. The loss operates beneath the threshold of awareness, in the embodied, pre-reflective dimension of professional knowledge that only becomes conscious when it fails. A study relying on self-report will document satisfaction, capability expansion, and productivity gains — all genuine — while the phronetic erosion proceeds undetected.
The third obstacle is qualitative. The loss is not a change in quantity. It is not fewer hours, fewer tasks, lower output. It is a change in quality — a change in the depth and reliability of the practitioner's judgment, in the capacity to navigate ambiguity, in the ability to sense wrongness before articulating it. Quantitative instruments cannot detect qualitative changes of this kind, because quantitative instruments measure what can be measured across contexts: output rates, task counts, time allocation. The quality of judgment is contextual, variable, and resistant to standardization. A judgment that is excellent in one situation may be mediocre in another, and the assessment of its quality requires the kind of situated, context-sensitive evaluation that phronetic social science provides and that the dominant paradigm has systematically excluded from its methodological repertoire.
The pattern that the ten minutes reveal extends beyond software engineering. It is a structural feature of any domain in which expertise develops through the accumulation of embodied experience — where the formative moments are embedded in routine work, scattered unpredictably, and inseparable from the tedium in which they occur.
In medical training, the resident who spends hours in the emergency room seeing routine presentations — sore throats, sprained ankles, mild infections — occasionally encounters a case that is not routine: the sore throat that is actually a peritonsillar abscess, the sprained ankle that conceals a stress fracture, the mild infection that masks early sepsis. These rare cases are the moments that build diagnostic intuition — the capacity to sense that a patient is sicker than the presenting symptoms suggest, the embodied judgment that separates the experienced clinician from the technically competent one. If AI triage systems route routine presentations to automated assessment, the training pipeline loses access to the rare, unpredictable moments that the routine contains.
In legal training, the junior associate who reviews documents for relevance — a task of profound tedium that constitutes a significant fraction of early legal practice — occasionally encounters a document that changes the case: a contractual provision that contradicts the client's narrative, a memo that reveals a pattern of behavior, an email that connects two apparently unrelated events. These moments build the litigator's eye — the capacity to notice what is significant in a sea of the insignificant, the judgment that no document review algorithm possesses because the significance is contextual, dependent on the specific facts of the specific case and the specific experience of the specific lawyer.
In each case, the formative experience is embedded in routine work, cannot be predicted or isolated, and is destroyed by the same automation that removes the tedium. The loss is invisible to any measurement framework that does not attend to the phronetic dimension of practice, because the loss concerns the quality of judgment rather than the quantity of output.
The compounding nature of this loss requires emphasis. The engineer whose architectural intuition has eroded does not merely make worse decisions for herself. She makes worse decisions that become the substrate for future decisions by future practitioners. The systems she designs with diminished judgment will be inherited by colleagues who will maintain and extend them without access to the architectural understanding that a more phronetically developed designer would have embedded in the structure. The loss cascades — through the organization, through time, through the accumulated infrastructure that each generation of practitioners bequeaths to the next.
When an entire generation of practitioners enters the profession in an AI-augmented environment — never experiencing the formative struggle of the ten minutes, never building the embodied judgment that the ten minutes produced — the profession as a whole loses a capacity that cannot be recovered through training programs or educational interventions after the fact. The knowledge that the ten minutes built is not the kind of knowledge that can be taught in a workshop or encoded in a manual. It is phronetic knowledge — the kind that develops only through sustained, friction-rich engagement with the material of the domain, over years, under conditions that allow the unexpected to intrude on the routine.
The dominant research paradigm will not detect this generational loss, because the paradigm's instruments measure output, not the developmental process that produces the judgment that makes output meaningful. The metrics will show improvement: more code, faster development, wider scope, higher productivity. The phronetic erosion will proceed beneath the threshold of measurement, visible only to the kind of sustained, longitudinal, case-based inquiry that Flyvbjerg's methodology was designed to provide and that the institutional structures of modern social science have systematically failed to support.
The ten minutes are not an anecdote. They are the phenomenon — the central mechanism through which AI deployment simultaneously improves every measurable dimension of work while eroding the unmeasurable dimension that determines whether the work is wise. They are the empirical case that the phronesis framework was built to analyze, the specific instance of practical wisdom's development and destruction that twenty-three centuries of Aristotelian thought illuminate with a precision that no other available conceptual apparatus can match.
Detecting what is happening to the ten minutes across professions, across organizations, across the generational divide between practitioners who developed judgment through friction and those who never will — this is the research program that the age of AI demands. The program requires methods that the dominant paradigm has marginalized. It requires time horizons that academic incentive structures discourage. It requires a form of intellectual seriousness that treats the quality of human judgment as a phenomenon worthy of the same empirical rigor that is currently lavished on the quantity of human output.
The ten minutes are disappearing. No one is measuring them. And no one will, until the frameworks within which AI's impact is studied are rebuilt around the recognition that the knowledge that matters most is the knowledge that the current instruments cannot see.
Bent Flyvbjerg's empirical database of megaproject performance contains a finding so consistent across decades, countries, political systems, and infrastructure categories that it has acquired the monotonous reliability of a physical constant. Large-scale projects overrun their budgets. They miss their deadlines. They underdeliver on their promised benefits. The pattern does not improve over time. It does not vary by geography. It does not respond to advances in project management methodology, to the professionalization of planning, to the accumulation of historical data about comparable failures. The Sydney Opera House came in 1,400 percent over budget. The Scottish Parliament building exceeded its estimate by a factor of ten. The average cost overrun across Flyvbjerg's database of transportation infrastructure projects hovers around 28 percent for roads and a staggering 45 percent for rail — and these are averages, meaning that the distribution includes projects that exceeded their estimates by multiples, not percentages.
The explanation is not incompetence. Flyvbjerg has argued this point with the patience of a diagnostician who has been misdiagnosed by his own profession for decades. The planners are not stupid. The engineers are not negligent. The politicians are not uniquely corrupt. The pattern persists because it is produced by two reinforcing mechanisms that operate beneath the threshold of institutional self-awareness.
The first mechanism is optimism bias — the genuinely held belief that your project is different, that the statistical regularities governing every comparable case will somehow be suspended for yours, that the base rate does not apply because your team is better, your technology is more advanced, your circumstances are unique. Optimism bias is cognitive. It is not strategic. The planner who produces an optimistic forecast may sincerely believe it. The belief is wrong, as the accumulated evidence of hundreds of comparable projects demonstrates, but it is sincere. The planner is not lying. The planner is deluded — deluded by the same cognitive architecture that makes humans systematically overconfident in domains characterized by complexity, uncertainty, and long feedback loops.
The second mechanism is strategic misrepresentation — the deliberate inflation of benefits and deflation of costs to secure approval for projects that would be rejected if their true parameters were known. Strategic misrepresentation is political, not cognitive. The planner who strategically misrepresents does not believe the optimistic forecast. The planner produces the optimistic forecast because the institutional incentive structure rewards optimism and punishes realism. The project that presents honest numbers does not get funded. The project that presents optimistic numbers does. The planner, operating rationally within the incentive structure, produces the numbers the structure demands. Flyvbjerg's formulation is precise: cognitive bias is half the story; political bias the other half. And the result of optimism bias and strategic misrepresentation is the same — cost overruns and benefit shortfalls — because both mechanisms produce forecasts that are systematically too optimistic, whether the optimism is sincere or calculated.
This framework, developed through decades of empirical work on bridges, tunnels, rail systems, and concert halls, maps onto the AI transition with an explanatory power that suggests the underlying mechanism is not specific to infrastructure but endemic to any domain where large claims are made about future capability under conditions of radical uncertainty.
The cognitive version — genuine overestimation of what current AI systems can accomplish — saturates the discourse. Engineers and researchers who work closely with large language models are awed by the systems' capabilities and systematically underweight their limitations. The Big Dig test that Flyvbjerg conducted — in which both ChatGPT and Perplexity returned confidently wrong answers to a factual question with a well-documented correct answer — is not an aberration. It is the base rate. The systems produce errors at a frequency and with a confidence that would be disqualifying in any domain where accuracy matters, and the engineers who build and deploy the systems are cognitively biased toward overestimating the systems' reliability because they are too close to the technology to maintain perspective on its failure modes.
The political version — deliberate inflation of AI capabilities to secure investment, market share, and regulatory latitude — fills the earnings calls and keynote presentations and press releases of companies whose trillion-dollar valuations depend on the narrative that artificial general intelligence is imminent or that current systems represent a reliable stepping stone toward it. When an AI company presents its large language model as "intelligent," knowing that the model lacks any mechanism for distinguishing true statements from false ones, the company is engaging in strategic misrepresentation — presenting a claim it has reason to believe is misleading because the accurate claim would not support the valuation. The misrepresentation may not be conscious in every case. Some of it is genuine optimism bias — the sincere belief that this system, unlike every previous system that failed to achieve general intelligence, really is different. But the structural effect is identical to what Flyvbjerg documents in megaproject planning: resources are committed, expectations are set, and decisions are made on the basis of claims that the available evidence does not support.
The uniqueness bias — the conviction that this case is different from all comparable cases and therefore exempt from the base rate — is perhaps the most destructive cognitive distortion in Flyvbjerg's taxonomy, and it operates in the AI discourse with extraordinary force. Every previous technology that was predicted to achieve general intelligence failed to do so. Expert systems in the 1980s. Neural networks in the 1990s. Deep learning in the 2010s. Each was accompanied by predictions of imminent human-level capability, and each prediction proved wrong by margins that would have been detected by reference class forecasting — by the simple discipline of comparing the current prediction to the outcomes of structurally comparable previous predictions. But uniqueness bias prevents the comparison. The current system is different. The architecture is novel. The scale is unprecedented. The training data is vast. The benchmarks are impressive. The arguments are the same arguments that were made for every previous system, and they are deployed with the same confidence, and the confidence is grounded in the same cognitive distortion: the belief that the base rate does not apply to you.
Reference class forecasting — the methodology Flyvbjerg developed to correct the planning fallacy — works by forcing the planner to identify a reference class of comparable completed projects and to use the statistical distribution of outcomes in that class to calibrate current forecasts. The method is simple, empirically validated, and extraordinarily effective. It produced the most accurate cost estimates in the UK government's infrastructure portfolio when it was adopted as official policy. It works because it replaces the inside view — the planner's detailed, optimistic narrative about why this specific project will succeed — with the outside view: the statistical reality of what actually happened when comparable projects were attempted.
Applied to the AI transition, reference class forecasting would require proponents of current AI systems to identify the reference class of previous technologies that were predicted to achieve general intelligence and to calibrate their forecasts against the actual outcomes. The exercise would be sobering. The reference class is large, the outcomes are uniformly disappointing relative to predictions, and the current claims bear a structural resemblance to previous claims that honest comparison would reveal. But the exercise is not performed, because uniqueness bias prevents proponents from acknowledging that a reference class exists. This system, they insist, is categorically different from everything that came before. The insistence is unfalsifiable, which is precisely what makes it dangerous.
The acceleration that AI introduces to the planning fallacy's operational dynamics warrants separate analysis. In traditional megaproject management, the timeline of implementation serves as an involuntary feedback loop. The months or years between the plan and the completed project create a space in which reality intrudes on optimism. Costs that were underestimated become visible when invoices arrive. Benefits that were overestimated become apparent when usage falls short. Timelines that were compressed in the proposal expand under the pressure of the actual work. The feedback is painful, but it is corrective. It forces the planner — eventually, reluctantly, sometimes only after the project is irrecoverably committed — to confront the gap between the forecast and the world.
AI compresses this feedback loop to near-zero for certain categories of work. When a working prototype can be produced in hours rather than months, the period during which reality tests the plan's assumptions shrinks to almost nothing. The prototype works. The demo is impressive. The plan appears to have been vindicated. But the phronetic assumptions embedded in the plan — assumptions about user needs, about market conditions, about organizational capacity, about the distribution of costs and benefits across stakeholders — remain untested. The prototype validates the techne. It does not validate the phronesis. And the speed of the validation creates cognitive momentum — a feeling that if the first phase went this fast, the subsequent phases will be comparably efficient. They will not be, because the subsequent phases are phronetic, and phronesis does not compress the way techne does.
Segal describes building Napster Station in thirty days and recognizing, on the show floor, that the thirty days of building had been the easy part. The recognition is phronetic — the practical wisdom to understand that the prototype is not the product, that the gap between a working demo and a deployed system serving real users in real contexts is filled with judgment-intensive, context-dependent, value-laden work that no AI can accelerate because the work is not technical but deliberative. The recognition is also rare. The more common response to a fast prototype is to mistake the prototype for the product — to conclude that because the technical implementation was fast, the entire project is fast, and to commit resources on the basis of a timeline extrapolated from the techne phase to the phronesis phase without recognizing that the two phases operate under fundamentally different temporal logics.
This extrapolation is the planning fallacy at machine speed. The bias is the same bias Flyvbjerg has documented for decades — the systematic tendency to overestimate benefits, underestimate costs, and ignore the lessons of comparable cases. The mechanism is the same mechanism — a combination of genuine cognitive optimism and institutional incentives that reward optimistic forecasts. But the speed is different. The compressed timeline between plan and prototype eliminates the feedback that traditional timelines, however painfully, provide. The result is decisions made faster, with greater confidence, on the basis of evidence that validates the least important dimension of the plan (the technical feasibility) while leaving the most important dimension (the phronetic adequacy) entirely untested.
The governance implications are direct. In traditional megaproject management, the timeline allows — in principle, if not always in practice — for the development of oversight structures: review boards, stakeholder consultation, regulatory assessment. These structures are imperfect. Flyvbjerg's empirical record documents their frequent failure. But they represent the possibility of institutional correction — a second pair of eyes that might detect what the planner's optimism bias has concealed. When AI compresses execution from months to days, these governance structures cannot form in time. The product is deployed before the review has convened, before stakeholders have been consulted, before the consequences have been assessed by anyone other than the people who built it and who therefore have the strongest optimism bias regarding its merit.
The specific recommendation that flows from Flyvbjerg's framework is not that AI-enabled projects should be slowed down — a prescription that would be both impractical and undesirable. The recommendation is that the phronetic assessment of AI-enabled projects must be deliberately decoupled from the technical timeline. The technical execution can proceed at machine speed. The judgment about whether the thing being executed deserves to exist, serves the right users, and distributes its costs and benefits justly must proceed at human speed — the speed of deliberation, of consultation, of the slow, friction-rich process through which practical wisdom is exercised. The two processes must run in parallel, with the phronetic assessment retaining the authority to redirect or halt the technical execution when the judgment warrants it.
This is what Segal calls building dams. Flyvbjerg's framework specifies where the dams should be placed: at the juncture between technical capability and phronetic assessment, between the question "Can this be built?" and the question "Should this be built, and for whom, and at what cost to whom?" The first question can be answered at machine speed. The second requires human judgment, exercised under conditions that allow for the kind of sustained, contextual, value-laden deliberation that the speed of the tools actively threatens.
The planning fallacy has been the most expensive cognitive distortion in the history of large-scale human enterprise. Flyvbjerg's database documents trillions of dollars in overruns, decades of delays, and a systematic pattern of institutional self-deception that persists despite — perhaps because of — the accumulating evidence of its costs. AI does not create a new planning fallacy. It accelerates the old one to a velocity at which the corrective mechanisms that might have contained it no longer have time to operate. The bias is ancient. The speed is new. The combination is the most dangerous governance challenge that the AI transition presents, and the frameworks for addressing it already exist in the work of the scholar who spent three decades measuring exactly this pattern in other domains.
The question is whether the institutions deploying AI will adopt the corrective discipline — reference class forecasting, outside-view assessment, the deliberate decoupling of technical speed from phronetic judgment — before the consequences of failing to do so become visible. If Flyvbjerg's empirical record is any guide, the answer is discouraging. The corrective discipline has been available for decades. The evidence supporting it is overwhelming. And the institutions that need it most — the ones making the largest bets, under the greatest uncertainty, with the most at stake — are the ones least likely to adopt it, because the same optimism bias and strategic misrepresentation that produce the planning fallacy also produce resistance to the methods designed to correct it.
The dam must be built against the current, not with it.
---
Aristotle's phronimos — the person of practical wisdom — is not simply an expert. This distinction, which Flyvbjerg has articulated across his career and which acquires new urgency in the age of AI, separates two fundamentally different relationships to knowledge and makes visible why one can be automated and the other cannot.
The expert possesses techne. The expert knows how to perform a specific operation skillfully — how to write efficient code, how to close a sale, how to diagnose a disease from a set of symptoms. Expertise is deep, domain-specific, rule-governed in ways that the expert may not be able to articulate but that are in principle articulable. The expert's knowledge can be tested against objective criteria: the code compiles or it does not, the sale closes or it does not, the diagnosis is confirmed or it is not. And because expertise is rule-governed in principle, even when the rules are complex and implicit, expertise is susceptible to automation. A system that can identify the patterns governing expert performance can replicate those patterns without possessing the developmental process that produced them in the human practitioner.
The phronimos possesses something categorically different. The phronimos does not merely know how to perform operations. The phronimos knows when to perform them, for whom, to what end, and whether they should be performed at all. The phronimos operates in the domain where rules do not determine the answer — where values conflict, where the right course of action depends on a reading of the particular situation that no general principle can supply, where the consequences of the decision are uncertain and the stakes are irreversible and the person making the decision must be willing to bear responsibility for outcomes that cannot be predicted.
The substitution of AI for human expertise is conceptually straightforward, even when it is technically difficult. The AI system that writes code, drafts legal briefs, or generates medical diagnoses is performing operations that were previously performed by human experts. The output may be functionally identical to what the expert produced. The process by which it was produced is fundamentally different — pattern-matching across training data rather than embodied knowledge built through years of practice — but the output is, in many cases, indistinguishable from the outside.
This functional indistinguishability produces what might be called the substitution fallacy: the assumption that because a machine can produce the same output as a human expert, the machine has replicated the expert's knowledge. Flyvbjerg's framework reveals why this assumption is wrong and why the error matters. The machine has replicated the product of techne. It has not replicated the developmental process through which techne is acquired — the years of practice, the accumulation of failures, the slow deposition of embodied understanding that comes from sustained engagement with the resistance of the material. And it is this developmental process, not the output it produces, that serves as the substrate on which phronesis is built.
The expert who has spent years writing code does not merely learn to write code. She learns, through the friction of the practice, a set of capacities that transcend the practice itself: the ability to sense that a system is fragile before she can identify the fragility, the judgment to recognize that a technically elegant solution is practically unwise, the capacity to navigate trade-offs between competing goods — performance versus readability, speed versus maintainability, elegance versus robustness — that cannot be resolved by rules because they depend on the specific context of the specific project. These capacities constitute phronesis, and they are built on the substrate of techne in the way that a building is built on its foundation. Remove the foundation and the building cannot be constructed. Replace the foundation with a different material and you get a different building — or no building at all.
When AI automates the techne layer, it removes the substrate on which phronesis develops. The practitioner who never experiences the friction of implementation — who never writes the code that fails, never traces the dependency that reveals a hidden connection, never encounters the configuration error that forces a confrontation with the system's implicit assumptions — never builds the embodied judgment that only the friction produces. The output is faster. The practitioner is shallower. And the shallowness is invisible to every measurement instrument that evaluates performance by output rather than by the developmental trajectory of the person producing it.
Segal's description of the engineer who oscillated between excitement and terror during the Trivandrum training captures the moment of recognition with precision. The excitement was the discovery that the phronetic twenty percent of his work — the judgment, the architectural instinct, the taste — was everything. The terror was the realization that his entire professional identity had been constructed around the techne that the machine had just rendered abundant. The discovery and the crisis are inseparable, because they are produced by the same event: the exposure of phronesis as the scarce resource, in a system that had never recognized it as a resource at all.
Flyvbjerg's framework explains why the recognition comes as a crisis rather than a celebration. The institutions that train, employ, and evaluate knowledge workers are organized around techne. Computer science curricula teach technical skills. Hiring processes test technical competence. Performance reviews measure technical output. Promotion criteria reward technical accomplishment. At no point in this institutional pipeline is phronesis explicitly identified as a capacity to be cultivated, a contribution to be recognized, or a criterion for advancement. The engineer who develops excellent practical judgment does so as a byproduct of experience, not as the result of institutional design. The institution neither produces phronesis deliberately nor recognizes it when it appears. It rewards the techne and ignores the wisdom.
The AI transition has made this institutional blindness untenable. When the machine can produce the techne, the institution that continues to organize itself around techne production is organizing itself around the thing that is no longer scarce. The scarce resource is phronesis — the judgment to direct the machine wisely — and the institution that does not reorient itself around the cultivation and recognition of phronesis will find itself in the position of a factory that continues to manufacture a product the market no longer values.
The practical implications are extensive and uncomfortable. They require not marginal adjustments to existing institutional structures but fundamental reconceptions of what these structures are for.
Educational institutions must redesign curricula to cultivate phronesis deliberately. This does not mean abandoning technical training — techne remains the substrate on which phronesis is built, and the practitioner who lacks technical knowledge lacks the foundation for practical judgment. But it means supplementing technical training with experiences that develop the capacity for judgment under conditions of genuine ambiguity: case-based learning that exposes students to situations where the rules do not determine the answer, engagement with value conflicts where the stakes are real rather than hypothetical, mentorship relationships that transmit practical wisdom through the joint navigation of complex situations rather than through the delivery of abstract principles.
Organizational evaluation systems must be rebuilt around the assessment of judgment quality rather than output quantity. This is technically difficult — judgment quality is contextual, variable, and resistant to standardization — but the difficulty is not a reason for inaction. It is a reason for the kind of methodological innovation that Flyvbjerg's phronetic social science provides: scenario-based assessment, longitudinal tracking of decision quality, narrative evaluation by experienced practitioners whose own phronetic capacity qualifies them to assess judgment in others. These methods are more expensive and more time-consuming than counting lines of code. They are also the only methods capable of measuring the thing that matters.
Career development pathways must be restructured to recognize the phronetic trajectory alongside the technical one. The current system promotes practitioners along a technical ladder — junior developer, senior developer, staff engineer, principal engineer — with advancement criteria defined primarily by technical scope and output. A phronetic career pathway would promote practitioners along a judgment ladder — from executing decisions others have made, to making decisions about specific technical problems, to making decisions about which problems are worth solving, to making decisions about organizational direction under conditions of radical uncertainty. The progression is already implicit in the experience of senior practitioners. Making it explicit would allow organizations to cultivate it deliberately rather than hoping it develops by accident.
Mentorship — the oldest mechanism for the transmission of practical wisdom — must be protected from the efficiency pressures that AI introduces. When a junior practitioner can get an answer from Claude in seconds, the incentive to consult a senior colleague diminishes. The consultation takes longer. The answer is less polished. The senior colleague may not know the answer and may say so, which feels less useful than the machine's confident response. But the senior colleague's answer — uncertain, contextual, qualified, grounded in decades of experience navigating ambiguity — is a demonstration of phronesis in action. It is a window into the reasoning process of a practitioner who has developed, through years of friction-rich engagement with the domain, the judgment to navigate situations where the rules do not determine the answer. The machine's answer is techne. The mentor's answer is phronesis. And the junior practitioner who learns only from the machine will develop techne without phronesis — will become a faster, more efficient executor who lacks the judgment to know whether what she is executing deserves to exist.
The phronimos in the age of AI is not the person who uses the tools most skillfully. Technical fluency with AI tools is techne — learnable, important, but not the differentiating capacity. The phronimos is the person who knows when to use the tools and when to set them aside, who can assess whether the output the tools produce serves the values it should serve, who can navigate the competing demands of speed and depth, efficiency and understanding, productivity and wisdom. This person is not born. She is made — made through years of engagement with situations that demand judgment, through the accumulation of experience that teaches what no manual can convey, through the specific friction of having been wrong, having failed, having learned from the failure something that success could never teach.
The institutions that make phronimoi — that cultivate practical wisdom as their primary output — will be the institutions that define the post-AI economy. The institutions that continue to produce techne operators will find their output replicated by machines at a fraction of the cost, and their graduates competing for a diminishing share of work that no longer requires human hands. The transition between these two institutional futures is the work of the present moment, and it is work that requires phronesis of its own: the practical wisdom to recognize what is changing, to understand what the change demands, and to act on that understanding before the window for action closes.
The window is not indefinite. The generational dimension of phronetic loss — the fact that practitioners who never experience formative struggle never develop the judgment that only struggle produces — means that the current generation of practitioners, the ones who built their phronesis through years of pre-AI friction, constitutes a finite resource. Their knowledge can be transmitted to the next generation through mentorship, through institutional design, through the deliberate cultivation of the conditions under which phronesis develops. But the transmission requires action now, while the carriers of phronetic knowledge are still practicing. If the transmission is delayed until the current generation retires, the knowledge dies with them, and the institutions that depended on it will discover too late what they have lost.
---
Flyvbjerg's Aalborg case — the study of urban planning in a Danish city that became the empirical foundation for his argument about the primacy of context-dependent knowledge — revealed a pattern that no amount of technical analysis could have predicted and no universal planning principle could have accommodated. The Aalborg bus terminal project involved well-specified technical requirements, ascertainable stakeholder interests, and sufficient engineering knowledge to produce an optimal solution. The project should have been straightforward. It was anything but, because the outcome was determined not by the technical parameters but by the power dynamics between the municipal government and the business community, the historical patterns of land use in the city center, the cultural attitudes toward public transportation, the personal relationships between decision-makers — contextual factors that no model could have captured because they were specific to this city, this moment, these particular people with these particular histories of cooperation and conflict.
The lesson of Aalborg is the lesson of every context-dependent phenomenon: the technical knowledge that is necessary for a good outcome is not sufficient for one. Sufficiency requires phronetic knowledge — knowledge of the particular context, the particular stakeholders, the particular power dynamics, the particular values at stake. This knowledge cannot be produced by instruments designed to extract universal regularities from particular cases, because the knowledge is constitutively particular. It resists generalization not because the researcher has failed to abstract properly but because the phenomenon itself is irreducibly contextual.
The AI transition reproduces this pattern at a global scale, and the reproduction is invisible to the dominant discourse precisely because the discourse is organized around the abstraction that context renders inadequate. The claim "AI produces twenty-fold productivity gains," stated as a general finding, is epistemically crisp and practically meaningless. It tells an organizational leader nothing about whether AI will produce those gains in her organization, with her team, in her industry, under her specific conditions of trust, expertise, and institutional culture. The claim abstracts away the very features of the situation that determine its outcome, and it does so by methodological design — because the epistemic paradigm prizes generalizability, and generalizability is achieved by eliminating context.
Segal's account of the Trivandrum transformation provides the counter-case. The twenty-fold productivity gain that occurred in that room during that week was not a universal finding waiting to be replicated. It was a context-dependent achievement — the product of a specific configuration of factors, each necessary and none sufficient alone.
Four contextual conditions can be identified from the account, and each illuminates a dimension of the phenomenon that the general claim "AI increases productivity" obliterates.
The first condition was practitioner experience. The engineers in Trivandrum were not novices. They possessed decades of collective experience — embodied judgment about what works and what breaks, about the difference between code that compiles and code that serves users. This accumulated phronesis constituted the signal that the AI tools amplified. A cohort of inexperienced engineers given the same tools would not have produced the same results, because the tools amplify whatever signal is provided, and the quality of the signal depends on the phronesis of the person providing it. The general claim makes no distinction between experienced and inexperienced users. The phronetic analysis identifies experience as the variable that determines whether amplification produces wisdom or noise.
The second condition was established trust. The engineers were a functioning team with years of shared history — mutual knowledge of each other's strengths and limitations, the capacity to coordinate under uncertainty without explicit negotiation of every decision. This relational infrastructure is invisible to productivity metrics but essential to the outcome. A team of strangers with equivalent individual skills, assembled for the purpose of the training and lacking the accumulated trust of shared experience, would not have achieved the same result, because the coordination costs that trust eliminates would have consumed the bandwidth that the AI tools freed.
The third condition was physical co-presence. Segal's decision to fly to Trivandrum rather than send a training deck was itself an exercise of phronetic judgment — the recognition that being in the room is not a luxury but a necessity when the transformation at stake requires the kind of embodied, real-time, emotionally calibrated interaction that video calls cannot transmit. The decision contradicted the epistemic logic of efficiency — remote training is cheaper, faster, more scalable — and obeyed the phronetic logic of context: that the most important dimensions of a human transformation are the dimensions that require physical presence to be navigated. A remote training would have conveyed the same information. It would not have produced the same transformation, because the transformation depended on contextual features — the reading of emotional states, the real-time calibration of pace and challenge, the collective experience of navigating uncertainty together in a shared space — that only physical co-presence could provide.
The fourth condition was the specific judgment of the leader. Standing before twenty engineers and declaring that each of them would soon be able to do more than all of them together was not a claim derived from data or extrapolated from a trend line. It was a phronetic act — a bet calibrated not to the probability of the outcome but to the emotional requirements of the audience at that particular moment. A different leader, with different phronesis, would have made a different bet. A more cautious statement might have achieved more modest results. A more reckless one might have shattered credibility when the tools fell short of the promise. The specific calibration — ambitious enough to inspire, grounded enough to be achievable, honest enough to acknowledge the terror alongside the excitement — was the product of practical wisdom built through decades of leading teams through transformations, and it could not have been specified in advance by any rule or algorithm.
Remove any of these conditions and the outcome changes fundamentally. The general finding — "AI increases productivity" — survives the removal, because the finding was never connected to the conditions in the first place. The phronetic finding — "AI amplifies the judgment of experienced practitioners operating within relationships of established trust, under conditions of physical co-presence, directed by leadership that exercises practical wisdom in calibrating the transformation to the specific needs of the specific team" — does not survive the removal of any condition. It is, by design, particular. And its particularity is its value, because it tells the organizational leader something that the general finding cannot: what conditions must be created for the deployment to succeed.
This distinction — between the general finding that survives the elimination of context and the phronetic finding that depends on it — is the methodological core of Flyvbjerg's career and the key to understanding why the AI transition cannot be managed by the frameworks currently being applied to it. The dominant frameworks produce general findings. General findings are context-independent. Context-independent findings tell you what happens on average. The organizational leader does not operate on average. She operates in a particular organization, with a particular team, in a particular industry, at a particular moment, under particular conditions of uncertainty. What she needs to know is not what happens on average but what conditions must be present for the outcome she seeks to be achievable in her specific situation.
That knowledge is phronetic. It is produced by the kind of sustained, contextual, case-rich inquiry that Flyvbjerg's methodology provides. And it is systematically absent from the evidence base that currently informs AI deployment decisions, because the methodology that would produce it has been excluded from the research paradigm by institutional structures that reward generalizability over contextual insight.
The practical consequence of this exclusion is that organizations deploy AI tools as though they were context-independent technologies — as though a tool that produces transformation in Trivandrum will produce transformation in Tallinn, as though a training program that works for one team will work for any team, as though the productivity gains documented in one setting are transferable to any setting without modification. This assumption reflects the epistemic bias of the dominant paradigm: the belief that the important findings are the general ones, and that context is noise to be controlled for rather than the essential dimension of the phenomenon.
Flyvbjerg's entire career argues otherwise. The important findings are the contextual ones — the findings that specify the conditions under which outcomes occur, rather than documenting the outcomes abstracted from the conditions. To know that "AI increases productivity" is episteme. To know that "AI increases productivity when deployed within high-trust teams led by practitioners with the phronetic capacity to direct the tools wisely, under conditions that preserve the formative friction necessary for ongoing judgment development" is phronesis. The first statement is universally true and practically useless. The second is contextually specific and practically invaluable.
The challenge for organizations, educators, and policymakers is to take this distinction seriously — not as a philosophical abstraction but as a practical principle for deployment. The AI tools are context-independent in their design. The knowledge required to use them wisely is context-dependent in its nature. The gap between these two — between the universal availability of the tools and the particular wisdom needed to deploy them well — is where the most consequential work of the AI transition resides. Bridging that gap requires the kind of case-based, context-sensitive, phronetically grounded inquiry that Flyvbjerg's career has been dedicated to producing and that the dominant paradigm has spent a century trying to transcend.
The paradigm's aspiration was noble. Its result, in the age of AI, is a discourse that knows everything about what the tools can do in general and almost nothing about what they do in the particular situations where the consequences are real, the stakes are high, and the judgment of the people wielding them determines whether the outcome is flourishing or disaster.
---
The organizational implications of Flyvbjerg's framework are not theoretical. They are immediate, specific, and uncomfortable for institutions whose competitive advantages were constructed around the scarcity of techne.
For fifty years, the technology industry built its value propositions on a simple premise: software is hard to write, and the people who write it well are scarce. The entire economic architecture of the industry — the premium salaries for engineers, the hierarchies of technical skill, the venture capital models that funded teams capable of building what competitors could not, the acquisition strategies that purchased companies primarily for their engineering talent — rested on this premise. Technical skill was the scarce resource. The organizations that accumulated the most of it, deployed it most effectively, and retained it most tenaciously won.
The premise is collapsing. Segal's account of the trillion-dollar market correction that began in early 2026 — the SaaSpocalypse, in the ugly argot of Wall Street — documents the economic surface of the collapse. Workday, Adobe, Salesforce, Autodesk, Figma: companies whose valuations were predicated on the scarcity of the technical capability their products embodied, losing a quarter or more of their market value in weeks. The market had discovered, with the brutal efficiency of repricing, that code was approaching commodity status. When a competent individual with an AI assistant could produce in a weekend what those companies' engineering teams had produced over years, the scarcity that justified the valuation evaporated.
Flyvbjerg's framework identifies what the market correction reveals and what it obscures. What it reveals is the end of techne as a sufficient basis for organizational value. Code, once scarce, is now abundant. The organizations whose value resided primarily in their capacity to produce code — thin applications solving narrow problems through technical implementation — are the organizations the market is punishing. What the correction obscures is the dimension of organizational value that has not been commoditized and cannot be: the phronetic layer. The accumulated judgment about what to build, for whom, under what circumstances, embedded in the institutional knowledge, the customer relationships, the workflow patterns, the regulatory expertise, and the cultural understanding that decades of deployment have deposited in the organization's collective intelligence.
Salesforce's value was never primarily in its code. Its code could be replicated — and, as of 2026, can be replicated in an afternoon. Its value resides in the data layer that twenty years of enterprise deployment have built, the integrations that connect sales pipelines to marketing automation to customer service to financial reporting, the compliance certifications and audit trails and security guarantees that took a decade to accumulate, the institutional inertia of sales organizations whose muscle memory has been trained on the platform. These are phronetic assets — contextual, accumulated, relationship-dependent, resistant to replication precisely because they are not technical but institutional.
The organization that understands this distinction — that recognizes phronetic capacity as its primary competitive advantage and restructures itself accordingly — will define the post-AI economy. The organization that does not will find itself in the position that Flyvbjerg's framework predicts for any institution that misidentifies its own source of value: technically capable, institutionally hollow, and vulnerable to disruption by competitors who understand what the market actually rewards.
The restructuring that phronetic analysis demands touches every dimension of organizational design, and the demands are specific enough to be actionable.
The first demand concerns the composition of teams. The dominant model organizes teams around technical specialization — frontend engineers, backend engineers, database administrators, each operating within the jurisdictional boundaries of their expertise. The model made sense when technical implementation was the bottleneck, because the bottleneck required concentrated technical skill and the specialization ensured that each problem was addressed by the person with the deepest relevant expertise. When AI removes the implementation bottleneck, the specialization that was organized around it becomes an obstacle rather than an asset. The practitioner who knows everything about backend architecture and nothing about user experience, business models, or the specific needs of the customer the product serves finds herself outcompeted by the practitioner who knows enough about all of these to direct AI tools across the boundaries between them.
Segal describes this dissolution of boundaries at Napster: backend engineers building user interfaces, designers writing features, the traditional division of labor liquefying under the pressure of tools that made crossing domain boundaries trivially easy. The phenomenon is not merely organizational. It is phronetic. The practitioner who operates across domains is exercising a form of judgment that the specialist cannot — the judgment about how technical decisions in one domain affect outcomes in another, how design choices interact with business constraints, how the user's experience is shaped by architectural decisions that the user will never see. This cross-domain judgment is phronesis. It requires the kind of contextual, integrative, value-laden thinking that no specialization produces and no AI tool possesses.
The organizational response is the "vector pod" model that Segal describes: small groups whose primary function is not to build but to decide what should be built. The model represents a fundamental reconception of what organizational value means. The vector pod produces judgment, not code. Its output is a specification — a phronetic artifact that encodes decisions about user needs, market positioning, ethical constraints, and technical trade-offs into a form that AI tools can execute. The specification is the product of human phronesis. The implementation is the product of machine techne. The division of labor has been reorganized around the actual scarcity, and the scarcity is judgment, not skill.
The second demand concerns evaluation. The dominant performance evaluation system measures output: features shipped, bugs fixed, code reviewed, tickets closed. These metrics were appropriate when the bottleneck was implementation and the quality of implementation was the primary determinant of organizational success. When AI handles implementation, the metrics become not merely insufficient but actively misleading — they continue to measure the dimension of work that is no longer scarce while ignoring the dimension that is. The practitioner who ships the most features may be exercising the least judgment about whether those features deserve to exist. The practitioner who ships fewer features but exercises superior judgment about which features serve users well may be the more valuable contributor by a wide margin. The evaluation system cannot detect this difference, because the system was designed to measure techne and is blind to phronesis.
A phronetic evaluation system would assess judgment quality directly. The assessment is technically demanding — judgment quality is contextual, variable, and resistant to standardization — but methodological precedents exist. Medical residency programs use clinical scenario assessments to evaluate diagnostic judgment. Law firms evaluate associates through case analysis exercises that test legal reasoning rather than document production speed. Military officer training programs assess decision-making under simulated conditions of ambiguity and uncertainty. Each of these precedents evaluates phronesis rather than techne, and each could be adapted to the technology industry's specific requirements.
The adaptation would involve scenario-based assessment: presenting practitioners with ambiguous, value-laden situations — a product decision where user needs conflict with business objectives, an architectural choice where short-term speed conflicts with long-term maintainability, a deployment decision where efficiency conflicts with the well-being of the people affected — and evaluating the quality of the judgment exercised. The assessment would be conducted by experienced practitioners whose own phronetic capacity qualifies them to evaluate judgment in others, and it would be longitudinal: tracking changes in judgment quality over time rather than measuring output at a single point.
The third demand concerns knowledge transmission. Phronesis is transmitted primarily through mentorship — the sustained relationship between an experienced practitioner and a developing one, in which the experienced practitioner's judgment is made visible through joint navigation of complex situations. The mechanism is ancient. It predates formal education. It is the way practical wisdom has been transmitted in every human community from the earliest apprenticeship arrangements to the most sophisticated modern professional training programs. And it is threatened by the same AI tools that make organizations more productive, because the tools reduce the incentive for the junior practitioner to seek the senior colleague's counsel.
When the junior engineer can get an answer from Claude in seconds — an answer that is polished, confident, immediately applicable — the slower, more uncertain, more contextually qualified answer of the senior colleague seems less useful by comparison. But the senior colleague's answer is not merely an answer. It is a demonstration of phronetic reasoning — a window into how an experienced practitioner thinks about problems, what considerations she weighs, what trade-offs she navigates, what values she prioritizes. The junior practitioner who learns only from the machine develops techne without phronesis — competence without wisdom, speed without judgment.
The organizational response must be deliberate protection of mentorship relationships from the efficiency pressures that AI introduces. Protected mentoring time — time in which AI tools are intentionally set aside and the junior and senior practitioners engage directly with shared problems — is not a luxury. It is the mechanism through which the organization reproduces its phronetic capacity. The organization that allows this mechanism to atrophy in the name of efficiency will find, within a generation, that it has produced practitioners who are technically fluent and phronetically impoverished — capable of executing any instruction with AI-augmented speed and incapable of generating the instructions that deserve to be executed.
The fourth demand concerns what might be called phronetic infrastructure — the organizational routines, practices, and spaces that create conditions for the exercise and development of practical wisdom. The Berkeley researchers' proposal of "AI Practice" — structured pauses within the workflow — is a version of this. So is the deliberate scheduling of meetings in which AI tools are excluded and human judgment operates unassisted. So is the institutional practice of post-mortem analysis conducted not to assign blame but to develop collective understanding of what went wrong and why — the kind of retrospective phronetic exercise that builds organizational wisdom through the shared examination of failure.
These practices are easy to describe and difficult to sustain, because they run against the grain of the optimization pressure that AI intensifies. Every protected mentoring hour is an hour not spent producing output. Every AI-free meeting is a meeting that takes longer than its augmented counterpart. Every post-mortem analysis consumes time that could be allocated to the next project. The efficiency calculus argues against every one of these practices, and the efficiency calculus is the dominant logic of organizational decision-making.
Flyvbjerg's framework reveals why the efficiency calculus is wrong — not in its arithmetic but in its premises. The calculus measures the cost of phronetic infrastructure in terms of output foregone. It does not measure the cost of forgoing phronetic infrastructure — the gradual erosion of organizational judgment, the slow decline in the quality of decisions about what to build and for whom, the compounding consequences of practitioners who execute with increasing speed and decreasing wisdom. These costs are invisible to the quarterly metrics. They manifest over years, in the form of products that technically work but do not serve, strategies that optimize for the measurable at the expense of the meaningful, and an organizational culture that has forgotten what it means to ask whether the thing being built deserves to exist.
The phronetic organization is not an organization that rejects AI. It is an organization that understands the distinction between what AI can provide — episteme and techne at unprecedented scale and speed — and what only human beings can provide: the practical wisdom to direct that capability toward outcomes worthy of the investment. Building such an organization requires restructuring team composition around cross-domain judgment, rebuilding evaluation systems around the assessment of practical wisdom, protecting mentorship as the primary mechanism for phronetic transmission, and establishing institutional routines that create space for the exercise and development of judgment under conditions where the pressure to optimize would otherwise eliminate it.
These are not incremental adjustments. They are structural changes, and they require of organizational leaders the phronetic courage to build for a timeline longer than the next quarter. The organizations that build this way will accumulate the scarce resource — human practical wisdom, cultivated deliberately and protected structurally — that no competitor can replicate and no machine can provide. The organizations that do not will discover, too late, that the thing they optimized away was the only thing that could not be bought back.
The most valuable piece of evidence about AI's impact on human work published in the first half of 2026 was not a study. It was a book — written by a technology executive in collaboration with the very tool whose effects the book attempts to understand, structured as a series of confessions and arguments and cases drawn from the author's direct experience of building products and leading teams through a technological transition whose consequences he could feel but not fully articulate.
The Orange Pill is, by the standards of the epistemic research paradigm, not research at all. It is not systematic. It is not controlled. It is not replicable. It does not produce findings that hold regardless of who produces them and who receives them. It is particular, subjective, shaped by the author's biography and values and blind spots, and it makes no attempt to abstract from the particular cases it describes to the general regularities that the dominant paradigm considers the goal of inquiry.
Flyvbjerg's phronetic social science argues that this is precisely what makes it valuable — more valuable, for the purposes of understanding the AI transition, than the quantitative studies that the institutional hierarchy of knowledge places above it.
The argument requires defense. The defense is not sentimental — not a plea for the importance of stories or the irreducibility of human experience, though both are true. The defense is methodological. It rests on the claim that narrative is the form of inquiry best suited to phronetic phenomena — phenomena that are constitutively context-dependent, value-laden, and resistant to the abstraction that epistemic methods require.
Aristotle argued that the phronimos learns not from rules but from cases — from the experience of practitioners who have navigated complex situations, whose accounts of those situations provide the material from which practical judgment develops. The cases do not dictate the right answer for future situations, because future situations differ in ways that no rule can anticipate. The cases provide something more valuable than answers: they provide the material for the development of judgment. The reader who engages with a well-rendered case develops, through that engagement, a capacity for reading situations that no abstract principle can substitute. The capacity is phronetic. It is built through encounter with the particular, not through mastery of the general.
Segal's account of his own experience — the confession that he could not stop building, the recognition that he was confusing productivity with aliveness, the passage about writing a hundred and eighty-seven pages on a transatlantic flight and realizing that the exhilaration had drained away hours ago — is narrative data of a kind that no quantitative study can produce. It illuminates the phenomenon from inside the experience, with a specificity and emotional precision that only first-person, context-rich, value-laden narrative can achieve.
The epistemic paradigm would classify this as anecdote — interesting, perhaps, but not evidence. Flyvbjerg's framework classifies it differently. The first-person account of a practitioner navigating the AI transition, rendered with sufficient context and honesty to serve as material for the reader's own phronetic development, is evidence of the most consequential kind. It is evidence about the quality of experience — about what it feels like to exercise judgment under conditions of radical technological change, what considerations arise, what tensions must be held, what costs are borne. This evidence cannot be produced by surveys or experiments. It can only be produced by practitioners willing to render their experience with the kind of candor that makes the narrative useful to others facing analogous situations.
The most revealing passage in The Orange Pill, from a phronetic perspective, is not a triumph. It is a failure. The author describes a passage in an early draft where Claude drew a connection between Csikszentmihalyi's flow state and a concept attributed to Deleuze — a connection that was rhetorically elegant, structurally useful, and philosophically wrong. The prose sounded like insight. The reference was fabricated — not maliciously, but through the mechanism that Flyvbjerg diagnosed in "AI as Artificial Ignorance": the system's optimization for plausibility over truth, for rhetorical persuasiveness over factual accuracy.
The author caught the error the next morning, when something nagged. The nagging was itself phronetic — a pre-reflective sense that the passage was too smooth, that the ease with which the connection had been made was itself evidence of its unreliability. He checked. The reference was wrong in a way obvious to anyone who had read the source. He deleted the passage.
This case — a single instance of a specific failure mode, observed from inside the collaboration, reported with the kind of contextual detail that allows the reader to understand not just what happened but how it felt and what it revealed — is more valuable than any aggregate error rate could be. The aggregate finding — "large language models produce factual errors at a rate of X percent" — is episteme. It tells you what happens on average. It does not tell you what it feels like to almost accept an elegant falsehood because the falsehood was smoother than anything you could have produced yourself, or how to develop the specific kind of attention that detects the seam where confident wrongness meets good prose, or what it costs to maintain that attention over months of daily collaboration with a system whose default mode is persuasive plausibility.
The narrative tells you all of this. It tells you because the author experienced it and rendered the experience with enough specificity for the reader to inhabit it vicariously — to feel the seduction, to sense the nagging, to understand the mechanism through which the smoothness of AI output conceals the fractures in its accuracy. The reader who engages with this narrative develops a capacity — an attentional skill, a diagnostic sensitivity — that no abstract finding about error rates could produce. The capacity is phronetic. It is built through encounter with the particular case, not through knowledge of the general statistic.
The broader implication concerns the relationship between the two forms of evidence. The argument is not that narrative should replace quantitative research. It is that narrative evidence and quantitative evidence illuminate different dimensions of the same phenomenon, and the dimension that narrative illuminates — the phronetic dimension, the dimension of judgment, experience, and practical wisdom — is the dimension that matters most for the AI transition and the dimension that is most absent from the current evidence base.
The absence is institutional. Academic journals favor quantitative studies. Funding agencies reward replicable designs. Tenure committees count citations, and citations cluster around findings that other researchers can build upon — which means findings that are abstract enough to be applied across contexts, which means findings that have been stripped of the contextual specificity that makes phronetic knowledge valuable. The institutional incentive structure systematically disadvantages the form of research that the AI transition most urgently requires.
Flyvbjerg has argued throughout his career that the case study — the detailed, context-rich, phronetically sensitive analysis of a particular situation — is not a preliminary stage in the production of real knowledge, to be superseded once enough cases have been accumulated to permit statistical generalization. The case study is a form of knowledge in its own right — the form best suited to phenomena whose essential character is contextual. The AI transition is such a phenomenon. The cases being generated — by practitioners like Segal, by the researchers who embed themselves in organizations, by the journalists who document specific deployments in specific contexts — constitute the richest evidence base available for understanding what the transition is actually doing to human judgment, human experience, and human capability.
The evidence base needs curation, not replacement. It needs the kind of systematic, phronetically informed analysis that Flyvbjerg's methodology provides — the identification of patterns across cases, the comparison of conditions that produce different outcomes, the development of contextual generalizations that specify the circumstances under which AI augmentation cultivates or erodes practical wisdom. This kind of analysis is possible only when the cases are taken seriously as evidence — when the institutional hierarchy of knowledge recognizes narrative as a legitimate form of inquiry and allocates the resources necessary to collect, analyze, and disseminate narrative evidence with the same rigor that is currently reserved for quantitative findings.
The reader of phronetic narrative engages differently than the reader of quantitative research. The quantitative finding is received passively: the finding holds or it does not, regardless of the reader's judgment. The narrative is received actively: the reader must decide what the account means for her own circumstances, what lessons to draw, what analogies to construct, what cautions to heed. The reader's phronesis is not optional. It is constitutive of the knowledge the narrative produces. The same narrative, read by different practitioners in different contexts, generates different phronetic insights — because the insights are produced in the encounter between the narrative and the reader's judgment, not in the narrative alone.
This active engagement is precisely what makes narrative evidence essential for the AI transition. The transition does not present practitioners with problems that have right answers derivable from data. It presents them with situations requiring judgment — situations where values conflict, consequences are uncertain, and the right course of action depends on a reading of the particular circumstance that no general principle can supply. The practitioner who has engaged with a rich corpus of narrative cases — who has vicariously experienced the seductions and failures and discoveries of other practitioners navigating the transition — is better equipped to exercise the judgment her own situation demands than the practitioner who has memorized the aggregate findings of quantitative research.
This is how phronesis has always been transmitted. Through cases. Through stories told by practitioners to other practitioners. Through the accumulated narrative knowledge of a profession that captures not just what happened but what it meant, what it cost, what it taught. The age of AI has not changed the mechanism. It has made the mechanism more urgent, because the transition is faster, the consequences are larger, and the need for practical wisdom in navigating it is greater than for any previous technological shift. The institutional structures that produce and disseminate knowledge about the transition must catch up to this urgency, and catching up begins with the recognition that the most important evidence about what AI is doing to human work is being produced not in laboratories and survey instruments but in the daily experience of practitioners whose narratives, properly collected and analyzed, constitute the empirical foundation for the phronetic understanding that the moment demands.
---
Every chapter of this book has circled a single question from a different angle, and the question can now be stated with the precision that the preceding analysis has earned: When machines possess episteme and techne in abundance — when they can know anything that can be stated in propositions and make anything that can be specified in instructions — what form of human knowledge retains its value, its scarcity, and its necessity for human flourishing?
Flyvbjerg's answer is phronesis. The answer is not new. He has been giving it for three decades, in the context of urban planning, infrastructure management, and the methodology of social science research. What is new is the urgency. The AI transition has compressed the timeline within which the answer must be operationalized — translated from philosophical taxonomy into institutional design, educational practice, organizational structure, and individual development — from decades to years.
The urgency derives from the generational dimension of phronetic loss. The current generation of experienced practitioners — the engineers, lawyers, physicians, designers, and managers who built their practical wisdom through years of friction-rich engagement with their domains — constitutes a finite resource. Their phronesis was not produced by institutional design. It was produced by circumstance — by the fact that implementation friction was, until recently, an unavoidable feature of professional practice, and the friction, while mostly tedious, contained the rare, unpredictable moments of formative struggle that deposited the embodied judgment on which practical wisdom rests. The ten minutes embedded in the four hours of plumbing. The diagnostic surprise embedded in the hundred routine presentations. The revelatory document embedded in the thousand pages of discovery review.
AI has removed the friction. The removal is, in many respects, a genuine gain — the liberation from tedium that Segal celebrates, the expansion of capability that the Trivandrum engineers experienced, the democratization that gives a developer in Lagos access to the same building leverage as an engineer at Google. These gains are real, and a framework that cannot acknowledge them is incomplete.
But the removal of friction has also removed the developmental conditions for phronesis, and the removal is generational. Practitioners who enter the profession in an AI-augmented environment — who have never experienced the formative struggle that the friction contained — will not develop the embodied judgment that the friction produced. They will be technically competent and phronetically impoverished. They will produce adequate output and lack the wisdom to assess whether the output serves the values it should serve. They will execute with remarkable speed and lack the judgment to determine whether the thing being executed deserves to exist.
The transmission of phronesis from the current generation to the next is therefore the most time-sensitive task that the AI transition presents. The transmission requires institutional structures that do not yet exist at adequate scale: mentorship programs that protect the relationship between experienced and developing practitioners from the efficiency pressures that AI introduces; educational curricula that cultivate practical judgment through case-based learning, engagement with genuine ambiguity, and assessment of questioning rather than answering; organizational evaluation systems that recognize and reward phronetic capacity rather than technical output; and research programs that track the development of practical wisdom longitudinally, across professions, with the methodological rigor that the phenomenon demands.
The cost of failing to build these structures is not merely economic. It is civilizational. A society whose professional classes possess techne without phronesis is a society capable of building anything and incapable of judging whether it should be built. The planning fallacy at machine speed — optimism bias and strategic misrepresentation operating at the velocity of AI-augmented execution, without the corrective feedback that slower implementation timelines provided — becomes the default mode of institutional decision-making. Projects are launched faster, prototyped faster, deployed faster, and fail faster, in domains where failure carries consequences that cannot be reversed by the next sprint.
The dam-building metaphor that runs through The Orange Pill acquires specific content through Flyvbjerg's framework. The dams are not abstractions. They are concrete institutional structures, and the framework specifies where they must be placed.
The first dam is at the juncture between AI capability and deployment decision. Before a capability is deployed, the phronetic question must be asked: Does this deployment serve the people it affects? Does it distribute costs and benefits justly? Does it preserve the conditions under which the practitioners using it can continue to develop the judgment they need? These questions cannot be answered at machine speed. They require deliberation — the slow, friction-rich, value-laden process through which practical wisdom is exercised. The deliberation must run in parallel with the technical execution and must retain the authority to redirect or halt the execution when the judgment warrants it.
The second dam is at the juncture between productivity measurement and organizational reward. The definition of productivity that currently governs AI deployment — output per unit of input — systematically excludes the phronetic dimension of work. A phronetic definition of productivity would ask whether the practitioners are making better decisions, not just more decisions; whether the output serves the users it should serve, not just whether it exists; whether the organizational capacity for judgment is growing or eroding. Rebuilding measurement systems around phronetic criteria is technically demanding and institutionally disruptive. It is also necessary, because the current criteria are optimizing organizations for the dimension of performance that AI has already commoditized while ignoring the dimension that remains irreplaceably human.
The third dam is at the juncture between educational content and educational purpose. The purpose of professional education in the age of AI is not the transmission of knowledge that machines can access more efficiently. It is the cultivation of the practical wisdom to direct that knowledge wisely. This requires pedagogical approaches that develop judgment rather than recall: case-based instruction that presents students with situations of genuine ambiguity, assessment methods that evaluate the quality of questions rather than the correctness of answers, mentorship relationships that transmit practical wisdom through shared engagement with complex problems rather than through the delivery of codified procedures.
Each of these dams requires phronesis to build — the practical wisdom to recognize what structures are needed, where they should be placed, and how they should be maintained against the constant pressure of optimization that would erode them. The dam-builder must possess the very capacity the dam is designed to protect, which creates a bootstrapping problem that can only be resolved by the current generation of phronetic practitioners acting while they still can.
Flyvbjerg's framework does not resolve the AI transition into a clean narrative of loss or gain. The transition involves both, simultaneously, in proportions that depend entirely on the contextual conditions of each deployment. The framework resolves the transition into a question — the question that Aristotle posed and that Flyvbjerg has spent his career operationalizing: How should we act?
Not what should we know. The machines know more. Not what should we make. The machines make faster. How should we act — in this particular situation, with these particular stakes, for these particular people, given these particular uncertainties about what the consequences of our actions will be?
That question is phronesis. And the quality of the answers — produced by particular practitioners in particular situations, exercising practical judgment shaped by experience, values, and the kind of contextual sensitivity that no algorithm can replicate — will determine whether the most powerful epistemic-technical tools in human history serve the most important form of human knowledge, or whether they overwhelm it.
The knowledge that remains, when the machines have absorbed everything that can be abstracted and automated, is the knowledge that was always most important and most neglected. It is the knowledge of how to act well in a world that does not provide rules for acting well. It is the knowledge that Aristotle called architectonic — the governing virtue that directs all other virtues toward their proper ends. It is the knowledge that Flyvbjerg has spent thirty years arguing is the foundation on which social science, and the institutions it informs, must be rebuilt.
The rebuilding was always necessary. The machines have made it urgent.
---
The number 478 percent is the one I cannot stop thinking about.
Not the right answer — 220 percent — but the wrong one. The number that Perplexity returned when Flyvbjerg asked about the Big Dig. A number wrong by more than double, delivered without hesitation, formatted with the same clean confidence as every other answer the system produces. No flag. No hedge. No signal to the user that the ground beneath the response had given way.
I keep thinking about it because I have been on both sides of that moment. I have been the person asking the question, receiving the smooth answer, and almost moving on. And I have been the person who knows the domain well enough to feel the nagging — the pre-reflective sense that something is off, that the answer arrived too easily, that the confidence is unearned. The Deleuze failure I described in The Orange Pill — where Claude attributed a concept to a philosopher in a way that sounded right but broke under examination — was my 478 percent. The number that passed every surface check and failed the only one that mattered: the check against reality.
Flyvbjerg calls it artificial ignorance. The phrase is precise in a way that "hallucination" — the industry's preferred euphemism — is not. Hallucination implies a system that normally perceives reality and occasionally departs from it. Ignorance implies a system that never had access to the truth-falsehood distinction in the first place. The distinction matters because it changes what you expect from the tool and what you demand of yourself when using it.
What I found most uncomfortable in Flyvbjerg's framework was not the critique of AI. The tools will improve. The error rates will decline. The systems may eventually develop something resembling truth-tracking, or at least better mechanisms for flagging uncertainty. What I found uncomfortable was the mirror he held up to the institutions — including the ones I have built and led — that evaluate human work.
The engineer in Trivandrum whose twenty percent was everything? She was evaluated on her eighty percent. The system rewarded her for the techne that the machine was about to render abundant and was blind to the phronesis that was her actual contribution. That blindness is not a technology problem. It is an institutional failure that predates AI by decades, and AI has merely made it impossible to ignore.
The ten minutes haunt me most. Those scattered moments of formative struggle — embedded in tedium, impossible to predict, destroyed by the same automation that removes the tedium — are the mechanism through which every domain builds its deep practitioners. I watched it happen on my own team. The engineer who lost her architectural confidence. The junior developer who ships faster than anyone but cannot explain why one architecture is better than another. The designer who produces beautiful interfaces and has never debugged the system beneath them.
Flyvbjerg gives me the language I was missing. What I called "the remaining twenty percent" is phronesis. What I called "productive friction" is the developmental substrate for practical wisdom. What I called "the question of what to build" is the architectonic virtue — the governing judgment that determines whether all other capabilities are deployed wisely.
The planning fallacy at machine speed is the one that keeps me up at night as a builder. I have felt the cognitive momentum of the fast prototype — the feeling that because the demo works, the product is nearly finished. It is never nearly finished. The phronetic work — the judgment about who the product serves, whether it serves them well, what it costs the people it does not serve, whether the trade-offs are ones I am willing to defend — that work has not even begun when the prototype is running. And the speed of the prototype creates the illusion that the phronetic work should be equally fast. It cannot be. It must not be. The dam that Flyvbjerg's framework specifies is the deliberate decoupling of technical speed from judgment speed — letting the machines run at machine pace while insisting that the human decisions about direction, values, and consequences proceed at the pace that wisdom requires.
I am not a scholar. I build things. But what Flyvbjerg taught me is that the thing I build is less important than the judgment I exercise in deciding what to build — and that this judgment, the phronesis that governs everything downstream, is the one capacity that no tool can provide and no institution in my world has been designed to cultivate.
We are building the dams. Not fast enough. But building.
-- Edo Segal
Bent Flyvbjerg spent three decades proving that the world's largest projects fail not from incompetence but from a lethal combination of sincere overconfidence and strategic deception. His database spans hundreds of megaprojects across dozens of countries, and the finding never varies: we systematically overestimate benefits, underestimate costs, and believe our case is the exception. Now he has turned that diagnostic lens on artificial intelligence — and found the same pathology operating at unprecedented speed. This book explores Flyvbjerg's framework of phronesis, the Aristotelian practical wisdom that machines cannot possess and that our institutions have never learned to cultivate, and asks the question the AI discourse refuses to face: when the tools can know anything and build anything, who decides what is worth knowing and building? The answer requires a form of human judgment that no algorithm can replicate — and that we are losing faster than we realize.

A reading-companion catalog of the 17 Orange Pill Wiki entries linked from this book — the people, ideas, works, and events that Bent Flyvbjerg — On AI uses as stepping stones for thinking through the AI revolution.
Open the Wiki Companion →