By Edo Segal
The sentence that stopped me cold was not about artificial intelligence. It was about testing.
"Program testing can be used to show the presence of bugs, but never to show their absence."
Edsger Dijkstra wrote that decades ago. I read it in the middle of a build session with Claude, at a moment when I was doing exactly what the sentence warns against — shipping code that worked for every case I could think of and calling that proof. It is not proof. It has never been proof. And in a world where AI generates code faster than any human can read it, the distance between "it works" and "it is correct" is the distance between confidence and catastrophe.
I built Napster Station in thirty days. Tested it. Shipped it. Watched hundreds of people interact with it on a trade show floor. The exhilaration was real. But Dijkstra's framework forces a question I did not ask at the time and cannot stop asking now: Did I understand what I built? Could I trace the logic of every component Claude generated? Could I identify the assumptions buried in code I never wrote by hand? Could I enumerate the conditions under which the system would fail?
The honest answer is no.
That "no" is not a confession of failure. It is a description of the structural condition every builder now inhabits. When the imagination-to-artifact ratio collapses to the width of a conversation, something extraordinary is gained — and something Dijkstra spent fifty years defending is lost. Not speed. Not capability. Understanding. The specific, hard-won, layer-by-layer understanding that comes from constructing something yourself, tracing its logic, proving its correctness.
Dijkstra was not a Luddite. He was a rigorist. He believed that the discipline of the mind is not a constraint on building but the foundation on which all reliable building depends. He believed simplicity is not a luxury but a survival requirement. He believed that the tools we use reshape the thoughts we can think — and that the reshaping happens below awareness, which is what makes it dangerous.
Every chapter in this book made me uncomfortable. Not because Dijkstra opposes the future I described in *The Orange Pill*. Because he sharpens it. He demands that the amplifier be worthy of what it amplifies. That the builder understand what she has built. That we stop confusing volume with signal.
The river of intelligence flows faster than ever. Dijkstra is the engineer who reminds you that speed without verification is not productivity. It is recklessness wearing a smile.
— Edo Segal ^ Opus 4.6
1930–2002
Edsger Wybe Dijkstra (1930–2002) was a Dutch computer scientist, mathematician, and one of the most influential figures in the history of programming. Born in Rotterdam, he studied theoretical physics and mathematics at Leiden University before turning to computing, eventually becoming the first person in the Netherlands to hold the title of "programmer" as a professional designation. His 1968 letter "Go To Statement Considered Harmful" ignited a revolution in software engineering, establishing structured programming as the foundation of reliable code. He received the Turing Award in 1972 for fundamental contributions to programming as a high, intellectual challenge. Over his career at the Eindhoven University of Technology and the University of Texas at Austin, he developed foundational concepts including the separation of concerns, stepwise refinement, and the insistence on provable correctness over empirical testing. His numbered EWD manuscripts — over 1,300 handwritten documents on topics ranging from algorithm design to the philosophy of computing — remain among the most cited informal publications in computer science. Dijkstra's lifelong argument was that programming is a branch of mathematics, not engineering, and that the discipline of formal reasoning is the only reliable foundation for building systems that work.
Programming, as Edsger Dijkstra conceived it, was never about computers. This is the first thing one must understand about the man, and it is the thing most frequently misunderstood. He said it himself, in a formulation that has been quoted so often it has lost its capacity to shock: "Computer science is no more about computers than astronomy is about telescopes." The statement is not a metaphor. It is a classification. Astronomy is a body of knowledge about celestial phenomena; the telescope is merely the instrument through which those phenomena are observed. Computer science, in Dijkstra's framework, is a body of knowledge about the construction of formal arguments; the computer is merely the instrument that executes them. Confuse the instrument with the discipline, and everything that follows will be built on a category error.
Dijkstra was born in Rotterdam in 1930, the son of a chemist father and a mathematician mother, and the inheritance of both parents is visible in everything he produced. From his father he took the habit of precision in observation — the insistence that phenomena be described in terms admitting no ambiguity. From his mother he took the conviction that mathematics is not merely a tool but a mode of thought qualitatively different from ordinary reasoning. When he enrolled at Leiden University in 1948 to study theoretical physics, he had not yet encountered programming. A summer course at Cambridge in 1951 changed his trajectory, but what he recognized in that encounter was not a new vocation. It was an intellectual crisis. The machines were new, the problems were urgent, and the people writing programs were doing so by trial and error — patching and testing and hoping, building systems whose correctness they could not demonstrate and whose failure modes they could not predict. To a mind trained in mathematical proof, this was an intolerable state of affairs.
The crisis Dijkstra identified in 1951 has a precise structure, and the structure has not changed in seventy-five years. It is this: a program is a formal argument expressed in a notation that happens to be executable by a machine. The argument either holds or it does not. It is either valid for all possible inputs or it is not. There is no "mostly valid" in logic, and there should be none in programming. But the profession that grew up around these machines treated programs not as formal arguments but as empirical artifacts — things to be tested, observed, adjusted, and shipped when they appeared to work for the cases that had been tried. The distinction between "appears to work" and "is correct" is the distinction between coincidence and proof. Dijkstra spent his entire career on that distinction, and the profession spent its entire history trying to avoid it.
What he advocated was discipline in both senses of the English word. A discipline: a field of study with its own methods, standards, and criteria for success, rooted in mathematical logic rather than in engineering pragmatism. And discipline: a regimen of constraints voluntarily adopted by the practitioner in the service of quality. The programmer who submitted to the discipline of structured, formally verified code was not limiting her creativity. She was channeling it through structures that made correctness achievable — in precisely the way that the sonnet form does not limit the poet but concentrates the poetic impulse through fourteen lines and a particular rhyme scheme, producing compression that free verse cannot match.
This framework — programming as mathematical discipline, correctness as the non-negotiable standard, the programmer as someone who constructs proofs rather than tinkers with artifacts — is the lens through which Dijkstra's perspective illuminates the world described in The Orange Pill.
Segal describes the collapse of what he calls the imagination-to-artifact ratio: the distance between a human idea and its realization reduced to the time it takes to have a conversation. A person with an idea and the ability to describe it in natural language can produce a working prototype in hours. The builder does not write code. The builder describes desired outcomes. The machine constructs the implementation. The celebration in The Orange Pill is genuine and, within its own framework, justified — more people can build more things more quickly than at any previous moment in human history. Segal calls this democratization. He calls it liberation. He describes the exhilaration of watching an idea become real in minutes, and the exhilaration is palpable on the page.
Dijkstra's framework sees something different in the same phenomenon. Not necessarily something contradictory — the exhilaration may be real — but something the exhilaration tends to obscure. What has collapsed is not merely the distance between imagination and artifact. What has collapsed is the distance between intention and deployment — and with it, the entire verification layer that once stood between them. In the old world, the programmer's struggle to implement her idea was simultaneously a struggle to understand it. Each bug discovered, each logical error traced to its source, each edge case identified and handled, deposited a layer of understanding. The programmer who shipped a finished system did not merely know that it worked. She knew why it worked. She could trace the logic from premises to conclusions, identify every assumption, enumerate the conditions under which the system would behave correctly and the conditions under which it would fail. This understanding was not a byproduct of the building process. It was the building process. The code was the proof made executable.
When Segal describes his engineer in Trivandrum building a complete user-facing feature in two days — a feature she had never built before, in a domain she had never worked in — the celebration focuses on what was gained: capability, speed, the crossing of professional boundaries that had seemed structural. Dijkstra's framework focuses on what was not acquired. The engineer produced a working feature. Can she trace its logic? Can she identify the assumptions embedded in the implementation? Can she enumerate the conditions under which it will fail? If Claude generated the code and the engineer tested it against available cases and it passed, then the engineer has produced an artifact whose internal logic she has not constructed, has not verified, and may not be able to explain. She has been given the answer without the proof.
The distinction between answer and proof is not pedantic. It is the distinction upon which the reliability of every complex system depends. A bridge that stands is not the same as a bridge that has been shown, through structural analysis, to be capable of standing under all anticipated loads. The first is an observation. The second is a guarantee. The observation is valuable — the bridge is, after all, standing — but it provides no assurance about what happens when the load exceeds what has been observed. The guarantee does provide that assurance, because it is derived from principles rather than observations, and principles hold universally where observations hold only for the cases observed.
Dijkstra's insistence on proof over testing was considered extreme even in his own era. Most of his colleagues acknowledged the theoretical superiority of formal verification while arguing that it was impractical for real-world systems. Programs were too large, too complex, too entangled with the messiness of real-world requirements to submit to the clean lines of mathematical proof. Dijkstra's response was characteristic: the programs were too large and too complex precisely because they had been built without the discipline that would have kept them small and simple. Complexity was not a given. It was a consequence of intellectual laziness — the accumulation of ad hoc decisions made by programmers who could not be bothered to think clearly about what they were building.
This argument has a new and sharper edge in the current moment. AI-generated code is not merely complex. It is complex in a way that defeats the strategies Dijkstra developed for managing complexity. When a human programmer builds a complex system, the complexity has a structure — it was introduced through a series of decisions, each of which can be examined and, if necessary, reversed. When an AI generates a complex system, the complexity has no recoverable decision history. The code exists. It works (or appears to work). But the path from intention to implementation passes through a neural network whose internal operations do not correspond to any logical structure that human reasoning can follow. The builder cannot simplify the code by reversing bad decisions, because no decisions were made — at least none that are accessible to human inspection.
Dijkstra wrote, in one of his numbered EWD manuscripts, that "the tools we use have a profound and devious influence on our thinking habits, and therefore on our thinking abilities." The observation was meant as a warning about programming languages — each language shapes the thoughts its users can think, and a poorly designed language produces poorly structured thought. But the warning applies with considerably greater force to the natural language interface that Segal celebrates. A programming language, however poorly designed, still requires the programmer to think in terms of logic — conditions, loops, data structures, algorithms. The natural language interface requires the builder to think in terms of descriptions — outcomes, behaviors, desired results. The shift from logical thinking to descriptive thinking is not a shift in style. It is a shift in kind. Describing what you want is a fundamentally different cognitive operation from constructing the logical argument that produces what you want. The first requires clarity of intention. The second requires clarity of reasoning. Both are valuable. They are not the same.
Segal is honest about this tension in ways that strengthen his argument even where the tension remains unresolved. His description of the engineer who lost architectural confidence after months of AI-assisted work — who "was making architectural decisions with less confidence than she used to and could not explain why" — is, in Dijkstra's framework, a predictable consequence of the elimination of the reasoning process. The engineer's confidence had been built through thousands of hours of constructing logical arguments in code, each one depositing a thin layer of understanding that accumulated into architectural intuition. When the construction was delegated to the machine, the deposits stopped. The existing layers remained — she still had the intuition she had built through years of prior work — but no new layers were being added. The architectural muscle was not being exercised. And muscles, whether physical or intellectual, atrophy through disuse.
The discipline of programming, as Dijkstra conceived it, was a discipline of the mind. Its purpose was not primarily to produce code, though code was its medium. Its purpose was to develop a particular quality of thought: the ability to hold complex logical structures in one's head, to reason about abstract systems with precision, to distinguish between what appears to be true and what has been demonstrated to be true. This quality of thought did not arrive spontaneously. It was cultivated through years of practice, through the specific struggle of writing programs that were not merely functional but provably correct. Eliminate the struggle and you may produce more artifacts. You will also produce fewer minds capable of knowing whether the artifacts are correct — and in a world of increasing computational complexity, that knowledge is not a luxury. It is the foundation upon which every reliable system depends.
The chapters that follow apply Dijkstra's core concepts — abstraction, structured reasoning, provable correctness, the separation of concerns, the relationship between simplicity and reliability — to the specific landscape The Orange Pill describes. The analysis will be critical. Dijkstra was not a man who softened his conclusions for the comfort of his audience, and the intellectual tradition he built does not permit softening now. But the criticism is not dismissal. It is the kind of criticism that takes its subject seriously enough to apply the most demanding standards available — which is, when one thinks about it, the only form of criticism worth offering.
Edsger Dijkstra spent the better part of four decades thinking about abstraction, and the conclusion he reached was both an endorsement and a warning. Abstraction is the most powerful intellectual tool available to the programmer. It is also the most dangerous. Its power lies in its ability to suppress irrelevant detail, allowing the mind to focus on the essential structure of a problem without drowning in the particulars of its implementation. Its danger lies in the fact that suppressed detail does not cease to exist. It merely becomes invisible. And invisible detail, in a system of sufficient complexity, is detail that will eventually produce consequences no one anticipated — because no one knew it was there.
Dijkstra used a comparison that illuminates the distinction with particular force. A well-designed abstraction, he argued, functions like a window: the programmer can see through it to the layer below when necessary and look away from it when the details are irrelevant. The window allows selective attention. It enables focus without enforcing ignorance. A poorly designed abstraction, by contrast, functions like a wall: it blocks the view entirely, and the programmer who needs to understand what lies behind it must tear it down, defeating the purpose of the abstraction altogether. The history of computing is, in a precise sense, the history of decisions about which abstractions should be windows and which should be walls — and the gradual, largely unremarked trend has been toward walls.
The first programmers worked with raw machine code, flipping switches and entering binary sequences that corresponded directly to the operations of the hardware. There was no abstraction. There was no concealment. The programmer understood the machine at the level of individual instructions, because there was nothing between her understanding and the machine's execution. The cost of this transparency was enormous: programming was slow, error-prone, and accessible only to those willing to master the machine's native language. But the benefit was equally significant: the programmer knew exactly what the machine was doing, because there was nothing between her intention and the machine's behavior except her own understanding.
Assembly language introduced the first layer. Mnemonics replaced binary codes. ADD replaced a sequence of ones and zeros. The programmer could now think in terms of operations rather than electrical states. The machine had not changed. What had changed was the programmer's relationship to the machine. A layer of translation had been inserted between intention and execution, and that layer made programming faster and less error-prone. But it began the process of concealment. The programmer who wrote ADD no longer needed to know which circuits were activated to perform the addition. That knowledge, essential to the machine-code programmer, became optional. And what is optional, over time, becomes unknown.
High-level languages — FORTRAN, COBOL, ALGOL, and later C, Pascal, Java, Python — each added new layers. Compilers translated human-readable code into machine instructions, freeing the programmer from the translation process. Operating systems managed hardware resources, freeing her from memory allocation and process scheduling. Libraries provided pre-built functions, freeing her from implementing common operations. Frameworks provided architectural templates. Cloud infrastructure abstracted away the hardware entirely. At each step, the programmer could build more with less effort. At each step, the programmer understood less about what she had built. The layers accumulated like geological strata, each burying the layer below, until the modern programmer worked at a level so far removed from the hardware that the machine might as well have been made of magic.
Segal acknowledges this trajectory in The Orange Pill with a candor that Dijkstra would have appreciated. He notes that his own knowledge of Assembler, once the foundation of his programming career, is "no longer useful." He observes that the same obsolescence is now arriving for the Python developer. Each abstraction layer, he writes, makes the previous layer's knowledge unnecessary. The confession is honest. What it does not do is reckon with the cumulative cost of that obsolescence. Each layer of lost knowledge is not merely a discarded skill. It is a lost capacity for diagnosis. The programmer who does not understand how memory is allocated cannot diagnose memory-related failures. The programmer who does not understand the compiler cannot distinguish between a bug in her logic and a bug in the translation of her logic. The programmer who does not understand the operating system cannot reason about the behavior of her code under resource contention. Each lost layer of understanding is a lost layer of accountability — a domain of the system's behavior for which no human being is responsible, because no human being can inspect it.
The natural language interface that The Orange Pill celebrates is, in Dijkstra's terms, the ultimate abstraction layer. It does not merely conceal the hardware, the operating system, or the programming language. It conceals the programming logic itself. The builder communicates intention — she describes what she wants — and receives implementation — code that purports to realize her intention. Everything between intention and implementation is hidden. The neural network that generated the code is opaque. The training data that shaped the network's behavior is unknown to the builder. The logical structure of the generated code — its assumptions, its edge cases, its failure modes — is visible only to those willing and able to read and analyze the code itself, a skill that the natural language interface was specifically designed to make unnecessary.
This produces what Dijkstra's framework identifies as a precise and novel pathology: maximum efficiency and maximum ignorance achieved simultaneously. The builder can build anything she can describe. She can verify nothing she cannot test. The abstraction has become so total that the window has been replaced not merely by a wall but by a wall without a door — a sealed surface through which nothing can be seen and behind which the entire logical structure of the system operates without human oversight.
Segal's ascending friction thesis — the argument that friction does not disappear but relocates upward — engages directly with this problem, and Dijkstra's framework both endorses and complicates it. Segal argues that when AI removes the friction of implementation, the difficulty ascends to higher cognitive work: vision, architecture, product judgment, the question of what should be built. The observation is correct as far as it goes. The laparoscopic surgery analogy in The Orange Pill is apt — the surgeon who loses tactile feedback gains the ability to perform procedures impossible with open hands. The friction relocates. The work becomes harder at a higher level.
But Dijkstra would have identified a critical asymmetry in the analogy that Segal does not address. The laparoscopic surgeon, though she has lost tactile feedback, still understands the procedure. She knows the anatomy. She can see the operative field through the camera. She can reason about what is happening inside the patient's body, even though she cannot feel it. The abstraction has changed her sensory relationship to the work, but it has not destroyed her cognitive relationship to it. She understands less through touch and more through image, but she still understands.
The AI-augmented builder's situation is categorically different. She has not merely lost one form of understanding and gained another. She has lost understanding of the implementation entirely. The ascending friction is real — the questions of what to build and for whom are genuinely harder than the questions of how to build — but the ascending friction operates at the strategic level while leaving the implementation level completely unaccounted for. The builder exercises judgment about what should exist. She exercises no judgment about how it is constructed. And between the "what" and the "how" lies the entire domain of correctness — the domain where bugs live, where edge cases hide, where the gap between intention and behavior produces consequences that no amount of strategic judgment can anticipate.
Dijkstra argued throughout his career that every abstraction layer should be accompanied by a verification mechanism appropriate to that layer. The programmer who used a compiler should be able to trust its correctness — but that trust should be grounded in formal verification of the compiler, not merely in accumulated experience. The programmer who used a framework should understand its invariants and verify that her use respected them. Each layer of concealment should be matched by a layer of assurance.
The natural language interface provides no such assurance. The builder does not know what is being concealed, because she did not design the abstraction. She did not choose what details to suppress and what to preserve. She did not define the interface between her intention and the implementation. The AI system made those choices, through a process she cannot inspect and based on criteria she cannot evaluate. The concealment is not structured, layered, or transparent. It is total. The builder sees the input — her description — and the output — the code. Everything between is hidden behind a wall that cannot be made into a window, because the operations behind it are not logical in the sense that Dijkstra meant by logical. They are statistical. They are learned. And no amount of architectural cleverness will make a neural network's internal operations traceable by human reasoning in the way that a compiler's translation rules are traceable.
The consequences extend beyond individual programs. Dijkstra understood that abstraction layers interact, and that the interactions between layers are where the most dangerous failures originate. A bug at one layer propagates through every layer above it, manifesting as a failure at a level so far removed from the original error that diagnosis becomes effectively impossible. In a system with properly designed abstractions, these propagation paths are at least theoretically traceable — one can follow the chain of dependencies from the failure back to its source. In a system built on AI-generated code, the propagation paths pass through a neural network whose internal structure is, by its nature, inscrutable. The failure occurs. The builder does not know why. She asks the AI to fix it. The AI generates new code that avoids the original failure — possibly by introducing new ones. She tests the new code. It works for the available test cases. She ships it. The underlying error may still be present, concealed by a new abstraction layer that routes around it rather than correcting it.
This is abstraction as trap rather than tool. The concealment that was supposed to enable focus has become concealment that prevents understanding. The layers that were supposed to separate concerns have become layers that separate the builder from any concern at all about the correctness of her system.
Segal's honest acknowledgment of his own Assembler obsolescence is, in Dijkstra's framework, a small instance of a much larger pattern — and the pattern does not stop at Assembler. The knowledge that is becoming unnecessary now is not the knowledge of machine code or memory management. It is the knowledge of logical structure itself — the ability to read code, trace its execution, identify its assumptions, and evaluate its correctness. This is the knowledge that every previous abstraction layer left intact, because every previous abstraction layer still required the programmer to write and understand code at some level. The natural language interface removes this final requirement. And with it goes the last human checkpoint between intention and deployment — the last point at which a human mind stands between an idea and its consequences and asks, with the rigor that Dijkstra spent his life demanding: is this actually correct?
On March 18, 1968, the Communications of the ACM published a letter from Edsger Dijkstra under the title "Go To Statement Considered Harmful." The title was not Dijkstra's — it was chosen by the editor, Niklaus Wirth — but it became one of the most famous titles in the history of computing, launching a phrase template imitated hundreds of times in the decades that followed. The imitators often missed what made the original argument powerful. It was not an aesthetic objection. It was not a matter of style or taste. It was a logical argument about the relationship between program structure and human understanding — and its implications reach considerably further than the specific construct it addressed.
The "go to" statement, in the programming practice of the 1960s, allowed a program to jump to any arbitrary point in its execution. If the programmer wanted to skip ahead, loop back, or branch to an entirely different section of code, she inserted a "go to" that transferred control to any labeled point in the program. The construct was powerful, flexible, and universally available. It was also, in Dijkstra's analysis, a disaster for program comprehension — and not for the reasons most people assume.
The problem was not that "go to" statements produced incorrect programs. Programs using "go to" could be and often were perfectly correct. The problem was that "go to" statements made programs impossible to reason about. When execution could jump to any arbitrary point, the reader of the program — including the programmer herself, returning to her own code after an absence of weeks — could not look at a section of code and understand what it did. Execution might arrive at that section from anywhere. The local context of any code section was potentially the entire program, because any other section might transfer control to it at any time. The program might work perfectly. But the programmer could not demonstrate that it worked, because she could not trace the flow of execution in any systematic way.
Dijkstra's insight was that program structure and human understanding are not independent variables. A program organized around arbitrary jumps is a program whose logic is untraceable by human reasoning — not because the logic is too complex, but because the structural principle of arbitrary jumping defeats the linear, step-by-step reasoning that is the human mind's most reliable mode of analysis. The programmer cannot hold the entire program in her mind at once. She must reason about it in sections, tracing execution through manageable chunks and assembling a picture of the whole from accumulated understanding of the parts. Arbitrary jumps make this impossible, because any section might be entered from any other section, and reasoning about a section requires considering every possible path by which execution might arrive there.
Structured programming was Dijkstra's solution. It replaced arbitrary jumps with disciplined control structures: sequences, selections (if-then-else), and iterations (loops). These structures had a critical property that "go to" lacked: hierarchy. A sequence executed in order. A selection chose one of two paths based on a condition. An iteration repeated a block until a condition was met. Each structure had a single entry point and a single exit point, which meant the programmer could reason about each structure independently — understanding what it did without needing to consider every possible context in which it might execute. This was not merely a stylistic improvement. It was an epistemological revolution. Structured programming made programs amenable to formal reasoning. The programmer could verify each structure independently and compose the verifications to demonstrate the correctness of the whole.
The resistance was fierce and prolonged. Programmers who had built careers around flexible use of "go to" saw Dijkstra's proposal as a straitjacket imposed by a theoretician who did not understand practical demands. Donald Knuth, no less, argued for a carefully disciplined use of "go to" rather than outright elimination. The debate lasted more than a decade. But the historical verdict is unambiguous: structured programming won. Not because every programmer accepted the argument, but because the programs produced by structured methods were measurably more reliable, more maintainable, and more comprehensible. The constraint that seemed like a limitation turned out to be an enabler. By giving up the freedom to jump anywhere, programmers gained the ability to understand everything.
Now consider what happens when this framework is applied to AI-generated code. The parallel is precise and the implications are severe.
The "go to" statement allowed arbitrary jumps in the flow of execution, making programs impossible to reason about. AI code generation introduces an arbitrary jump of a different kind — not in the flow of execution, but in the flow of creation. The builder describes an intention. The AI generates code. Between the intention and the code lies a neural network whose internal operations are as untraceable as the most convoluted "go to"-laden program Dijkstra ever criticized. The builder cannot follow the logic of the generation, because the generation occurred within a system whose internal states do not correspond to any logical structure that human reasoning can follow. She can inspect the output — the generated code — but she cannot trace the path from her description to that output. The path is the neural network, and the neural network is, in Dijkstra's precise sense of the term, an arbitrary jump from one domain to another whose trajectory cannot be followed by human reasoning.
This distinction is crucial, and it reveals something the surface-level comparison might obscure. In the 1960s, the problem was in the artifact — the program's structure made it unanalyzable. The solution was to restructure the artifact so that it became amenable to human reasoning. In the 2020s, the problem is not in the artifact but in the process that produces it. The generated code might be beautifully structured. It might follow every convention of modern software engineering. It might be elegant, efficient, and correct. But the process by which it was produced is, from the builder's perspective, the ultimate "go to" — an arbitrary jump from description to implementation whose internal logic cannot be traced by the person who requested it.
Dijkstra's original argument had a specific logical structure that bears repeating. He did not argue that "go to" was always used badly. He argued that its availability made bad use too easy and good use indistinguishable from bad use — that the mere existence of the construct threatened program quality because it allowed the programmer to take shortcuts that undermined the verifiability of the entire system. The argument was about structural temptation, not individual competence. Even a brilliant programmer using "go to" judiciously produced code that could not be verified by the same methods applicable to structured code, because the structural property that enabled verification — single entry, single exit, hierarchical composition — was absent.
The analogous argument about AI code generation is this: even a brilliant builder using AI tools judiciously produces systems whose generation process cannot be verified by any method currently available. The code may be inspectable after generation, and a sufficiently skilled reader may be able to evaluate it. But the path from intention to implementation — the neural network's transformation of natural language into executable logic — remains opaque regardless of the skill or discipline of the person requesting the generation. The opacity is not a bug in the tool. It is the tool's fundamental operating principle. And it creates a structural temptation that Dijkstra would have recognized instantly: the temptation to accept the output without verifying it, because the output looks right and the verification is hard and the deadline is real and the tool's track record has been acceptable so far.
"Acceptable so far" is exactly the epistemic standard that Dijkstra spent his career arguing against. A program that has worked for all tested inputs has an acceptable track record. But acceptable track records say nothing about untested inputs. The "go to"-laden program that worked for the test suite might fail catastrophically for the input that no one thought to test. The discipline of structured programming was designed precisely to eliminate this dependency on testing, to produce programs whose correctness could be demonstrated through reasoning rather than through the accumulation of successful observations. AI code generation reintroduces the dependency in its most extreme form: the builder has no method of reasoning about the generation process and must rely entirely on empirical observation of the output.
Segal captures something related to this in his description of Claude's "most dangerous failure mode" — what he calls "confident wrongness dressed in good prose." He describes a passage in his own book where Claude drew a connection between Csikszentmihalyi's flow state and a concept attributed to Gilles Deleuze. The passage was eloquent. The connection was wrong. The philosophical reference was incorrect in a way obvious to anyone who had read the primary source, but the prose was smooth enough to pass inspection by someone who had not. Segal caught the error because he checked. But the episode illustrates Dijkstra's structural concern with devastating clarity: the output looked correct. The process that produced it was opaque. The error was concealed by the very quality — fluency, coherence, apparent authority — that made the output seem trustworthy.
This is the "go to" in the generation process made visible. The jump from Segal's description to Claude's output passed through a neural network that produced something plausible but incorrect, and the plausibility was the mechanism of concealment. In a structured program, an error produces a visible anomaly — a test failure, a type mismatch, a logical contradiction that the structure of the code makes detectable. In AI-generated output, an error can be perfectly consistent with the output's surface properties — grammatically correct, stylistically appropriate, thematically relevant — while being substantively wrong. The structure of the generation process does not make errors detectable. It makes them invisible.
Dijkstra argued in 1968 that the "go to" should be eliminated from programming languages because its availability created a structural temptation that discipline alone could not reliably resist. The current analogy does not admit the same solution — one cannot eliminate AI code generation from the landscape any more than one could have eliminated electricity from the factory. But Dijkstra's deeper point survives the disanalogy. The discipline must match the temptation. If the structural temptation of AI-augmented building is to accept output without verification — to ship code that looks right without establishing that it is right — then the discipline required is a discipline of verification: reading generated code, tracing its logic, identifying its assumptions, testing its edge cases with the rigor that the generation process does not provide.
Whether that discipline will be adopted is a separate question. Dijkstra spent fifty years advocating the discipline of structured programming, and even that comparatively modest demand was resisted by the majority of practitioners for the better part of a generation. The discipline he would demand now — that every builder who generates code through AI must also develop the capacity to verify it — is substantially more demanding and substantially less likely to be adopted. But the fact that a demand is unlikely to be met does not make it wrong. It makes it urgent.
Dijkstra believed that programs should be proven correct the way theorems are proven in mathematics — through formal reasoning from axioms and inference rules, producing a chain of logic that demonstrates, conclusively, that the program satisfies its specification for every possible input. This belief was considered extreme in his own time. Most of his colleagues acknowledged the theoretical superiority of formal verification while arguing, with varying degrees of conviction, that it was impractical for real-world systems. Programs were too large, too complex, too entangled with the messiness of actual requirements to submit to the clean lines of mathematical proof. Dijkstra's response was characteristic: the programs were too large and too complex precisely because they had been built without the discipline that would have kept them manageable. Complexity was not a constraint imposed by the world. It was a consequence of intellectual failure — the accumulated result of decisions made by people who could not be bothered to think clearly about what they were building.
The argument for provable correctness rests on a distinction so simple that its importance is routinely underestimated. Testing can demonstrate the presence of bugs. It cannot demonstrate their absence. This is not a limitation of testing methodology that better testing might overcome. It is a logical property of the relationship between finite tests and infinite input spaces. A program that accepts user input may receive any of an effectively infinite number of possible inputs. Testing examines a finite subset of those inputs. No matter how large the subset, the untested inputs remain, and any one of them might trigger a failure that no tested input reveals. The only way to establish that a program is correct for all possible inputs is to reason about the program's logic — to prove, from the structure of the code itself, that the code's behavior satisfies its specification under all conditions.
Dijkstra understood that this standard was demanding. He did not propose it because it was easy. He proposed it because the alternative — shipping code whose correctness had been observed but not proven — was, in his assessment, fundamentally irresponsible. The programmer who ships tested-but-unverified code is making a bet: she is betting that the untested inputs will not trigger failures. The bet may pay off. It often does. But when it does not — when the untested input arrives and the unverified code fails — the failure is not a surprise in the logical sense. It is the predictable consequence of having deployed a system whose correctness was assumed rather than demonstrated.
This distinction acquires a new and sharper urgency in the context of AI-generated code.
Consider the sequence of events Segal describes with evident pride: an engineer in Trivandrum, working with Claude Code, builds a working version of a feature that had been on the backlog for four months. The previous estimate was six weeks of development time. The feature is completed, tested, and deployable by Wednesday afternoon. The celebration is understandable. The productivity gain is spectacular. But Dijkstra's framework asks a question that the celebration does not address: tested against what specification? The feature works for the test cases. Do the test cases cover the input space? Were the test cases designed by someone who understood the implementation well enough to identify its failure modes? Or were the test cases designed by someone who understood the desired behavior and tested the output against that behavior, without understanding the logic by which the output was produced?
The distinction matters because test design is itself a function of understanding. The most revealing test cases — the ones that expose subtle bugs, the ones that target the boundary conditions where correct behavior is most difficult to achieve — are designed by people who understand the implementation deeply enough to know where it is most likely to fail. A tester who does not understand the code can test the obvious cases: does the function return the right answer for typical inputs? Does the interface render correctly on standard screens? Does the system handle the expected error conditions? These tests are valuable. They are also insufficient. The bugs that survive testing are, almost by definition, the bugs that the tester did not think to test — the edge cases, the race conditions, the interactions between components that only become visible when the system is stressed in ways that normal testing does not stress it.
When the builder does not understand the implementation — when the code was generated by an AI and accepted on the basis of output testing — the builder is in the worst possible position to design revealing tests. She does not know where the code is fragile. She does not know what assumptions the AI made in generating the implementation. She does not know what input combinations might violate those assumptions. She tests what she can think of, and what she can think of is limited by her understanding, and her understanding does not extend to the implementation she did not write.
This is not a hypothetical concern. In 2024, a study from Purdue University found that ChatGPT's answers to programming questions were incorrect fifty-two percent of the time, yet users preferred the AI's responses due to their fluency and apparent comprehensiveness. The AI's failures were concealed by the quality of its presentation — precisely the "confident wrongness dressed in good prose" that Segal identified in his own collaboration with Claude. When the surface is polished, the errors beneath it become harder to detect, and harder to detect means less likely to be caught before deployment.
Dijkstra proposed an alternative to the test-and-hope methodology that he called "program derivation" — the practice of constructing the program and its correctness proof simultaneously, so that the program, when complete, was correct by construction rather than by coincidence. The programmer did not write the code and then verify it. She derived the code from its specification through a series of formal steps, each of which preserved the correctness that the specification established. The result was a program whose correctness was not a property to be tested but a property that had been built in from the first line.
This methodology was always more aspirational than practical for large systems, and Dijkstra's critics were not wrong to point out its scalability challenges. But the principle behind it — that correctness should be constructed, not discovered — retains its force even where the specific methodology does not. The principle says: if you cannot demonstrate why your program is correct, you do not know that it is correct, regardless of how many tests it has passed. Passing tests is evidence. Demonstrated correctness is proof. And in domains where failure has consequences — medical systems, financial systems, infrastructure controls, anything that affects human safety — the difference between evidence and proof is the difference between acceptable risk and genuine reliability.
AI-generated code exists entirely in the domain of evidence. The builder tests. The output passes. She ships. At no point in this process does anyone demonstrate, through formal reasoning, that the code is correct. The AI cannot provide such a demonstration, because its generation process is statistical rather than logical — it produces code that is probable, not code that is proven. The builder cannot provide the demonstration either, if she has not read and analyzed the code with sufficient rigor. And the rigor required is not trivial: reading someone else's code (or some system's code) and identifying its failure modes requires a level of expertise comparable to writing the code from scratch, which is precisely the expertise that the natural language interface was designed to make unnecessary.
The 2026 paper from the Alignment Forum — "On the Formal Limits of Alignment Verification" — demonstrates that Dijkstra's concerns apply not merely to individual programs but to the AI systems that generate them. The paper proves that no verification procedure can simultaneously satisfy three properties: soundness (no incorrect system is certified as correct), generality (verification holds over all possible inputs), and tractability (verification completes in reasonable time). Any two of these properties are achievable. All three together are not. This is a formal impossibility result — not a practical difficulty that better technology might overcome, but a mathematical proof that the standard Dijkstra advocated cannot be fully achieved for systems of sufficient complexity.
Dijkstra would have found this result simultaneously vindicating and alarming. Vindicating because it confirms, with mathematical rigor, that the verification problem is real — that there is no free lunch, no way to generate code and verify it and have both operations be complete and efficient. Alarming because it means that the gap between tested code and verified code cannot be closed, even in principle, for the systems that matter most.
The practical consequence is this: the world that The Orange Pill describes — a world of AI-generated code deployed at speed and scale — is a world in which the gap between testing and verification is not merely unaddressed but structurally unaddressable for the most complex systems. The builder tests what she can test. The untested space remains. And the untested space grows larger as the systems grow more complex, because the complexity of AI-generated systems scales faster than the capacity of testing to cover them.
Segal writes about the "exhilaration" of building with Claude — the rush of seeing an idea become real in minutes. Dijkstra would not have disputed the genuineness of the exhilaration. But he would have observed, with the precision that characterized every observation he ever made, that exhilaration is not evidence of correctness. A bridge that is built in a day and appears to stand is exhilarating. A bridge that has been analyzed and shown to withstand all anticipated loads is reliable. The first may also be reliable. But the exhilaration provides no assurance of it. And in a world where the speed of building has accelerated by an order of magnitude while the rigor of verification has not accelerated at all, the distance between exhilaration and reliability is growing — and growing in a direction that Dijkstra spent his life trying to reverse.
The discipline he would demand is not that builders stop using AI. It is that they stop confusing tested code with verified code, stop treating the output of a test suite as evidence of correctness, and develop — or insist that their organizations develop — the verification practices that the speed of generation has made more necessary, not less. Every hour gained in generation is an hour that should be invested in verification. That this investment is unlikely to be made, given the structural incentives of the market, does not make the demand unreasonable. It makes the consequences of ignoring it predictable. And Dijkstra, who spent fifty years making predictions that the profession preferred to ignore, would have stated this prediction with the same confidence he brought to every other: the code that is generated without verification will fail. The only questions are when, where, and at what cost.
Dijkstra introduced the phrase "separation of concerns" in his 1974 paper "On the Role of Scientific Thought," and the concept became one of the most widely adopted — and most widely misunderstood — principles in the history of software engineering. The misunderstanding is instructive, because it reveals exactly the gap between what Dijkstra meant and what the profession heard, and that gap has widened into a chasm in the age of AI-generated code.
What the profession heard was an organizational principle: divide your program into modules, each responsible for one thing. Put the database logic here, the user interface there, the business rules in between. Keep them separate. The principle was adopted enthusiastically, encoded into architectural patterns — Model-View-Controller, microservices, layered architectures — and treated as a matter of good engineering hygiene, roughly equivalent to keeping a clean desk. Useful. Sensible. Not particularly profound.
What Dijkstra meant was something far more radical. The separation of concerns was not primarily an organizational technique for code. It was an epistemological discipline for thought. The programmer, facing a problem of any real complexity, could not think about everything at once. The human skull is, as Dijkstra repeatedly observed, strictly limited in capacity. The only way to manage complexity that exceeds that capacity is to address one concern at a time, in isolation from all other concerns, and to verify that each concern has been correctly handled before proceeding to the next. The separation was not about where code lives in the file system. It was about what the programmer is thinking about at any given moment — and, crucially, what she is not thinking about.
This is a discipline of attention, and its rigor is easy to underestimate. To address correctness separately from efficiency means that when you are establishing that the algorithm produces the right answer, you do not permit yourself to think about how fast it produces the answer. To address the interface separately from the implementation means that when you are defining what a module does, you do not permit yourself to think about how it does it. Each concern, isolated from the others, becomes small enough to be held in the human mind, analyzed completely, and verified with confidence. The composition of separately verified concerns yields a system whose behavior is — Dijkstra's term — "intellectually manageable." A system that a human being can understand, not by comprehending it all at once, which is impossible for any system of real complexity, but by comprehending each concern independently and trusting that the concerns compose correctly.
The trust in composition is itself grounded in formal reasoning. If concern A has been verified independently and concern B has been verified independently, and if the interface between A and B has been specified precisely enough that each concern's verification assumed only what the interface guarantees, then the composed system inherits the correctness of its parts. This is not magic. It is the modular structure of mathematical proof applied to computational systems. And it works only when the separation is genuine — when each concern really has been addressed in isolation, when the interfaces really are precise, when no hidden dependencies leak from one concern into another.
AI-generated code destroys this discipline at the most fundamental level, and it does so not in the code but in the process of creation.
When a builder describes a desired outcome to an AI system and receives a complete implementation, all concerns have been addressed simultaneously by a process the builder does not control and cannot inspect. The AI did not address correctness separately from efficiency. It did not define interfaces before implementing functions. It did not verify each concern independently and compose the verifications. It generated code that appears to satisfy the described outcome, through a statistical process that optimizes for plausibility rather than for the structured, layered, independently verifiable construction that Dijkstra's principle demands. The resulting code may be organized into modules. It may have a clean architecture. But the organization exists in the artifact, not in the understanding of the person who requested it.
This distinction — between organization in the code and organization in the mind — is the crux of what Dijkstra's framework reveals about the current moment. A codebase can be perfectly modular, with clean interfaces and well-separated responsibilities, and still be intellectually unmanageable if the person responsible for it did not create the separation and does not understand the reasoning behind it. The separation of concerns is not a property of the code. It is a property of the relationship between the code and the mind that produced it. When that mind is a neural network, and the human who requested the code cannot reconstruct the reasoning that produced the separation, the separation exists only on paper. It provides organizational convenience. It does not provide epistemic assurance.
Segal describes, with evident excitement, the dissolution of professional boundaries that AI tools enable. A backend engineer builds user interfaces. A designer writes features. The boundaries between specializations, he argues, were artifacts of the translation cost — when the cost of moving between domains dropped to the cost of a conversation, people moved. The celebration is understandable. The boundaries were real constraints, and their dissolution genuinely expands what individuals can attempt.
But Dijkstra's framework identifies something in this dissolution that the celebration obscures. The boundaries between specializations were not merely organizational. They were epistemic. The backend engineer did not build interfaces because building interfaces required a different set of concerns — visual design, user interaction patterns, accessibility requirements, rendering performance — and addressing those concerns required knowledge the backend engineer did not possess. The boundary existed because the concerns were genuinely different, and competent handling of each concern required domain-specific understanding. The translation cost that enforced the boundary was real, but it was also, in a sense, protective. It ensured that each concern was addressed by someone who understood it.
When AI dissolves the boundary, the engineer builds the interface without possessing the domain knowledge that interface construction requires. The AI possesses the knowledge — or rather, it possesses a statistical approximation of the knowledge, derived from training on millions of examples of interface code. The output may be competent. It may even be good. But the engineer cannot evaluate whether it is good, because evaluation requires the same domain knowledge that the AI was supposed to supply. She can evaluate whether the interface looks right. She cannot evaluate whether it handles edge cases correctly, whether it meets accessibility standards, whether it will perform adequately under load, whether the interaction patterns are appropriate for the user population — because these concerns require understanding she has not developed, and the AI has not separated them for her inspection. It has addressed them all simultaneously, opaquely, and presented the result as a finished artifact.
The separation of concerns was designed to prevent exactly this situation: a system in which multiple concerns have been addressed simultaneously by a single process, producing an artifact that cannot be decomposed into independently verifiable components by the person responsible for it. When Dijkstra said that intellectual manageability was the goal, he meant that the programmer should be able to hold each piece of the system in her mind with complete clarity. AI-generated code produces systems whose pieces may be clearly organized but whose organization the builder cannot independently verify, because she did not create it and does not understand the reasoning behind it.
There is a further problem, subtler and more corrosive. When concerns are separated in the mind of the programmer, the programmer develops an understanding of how concerns interact — of the places where correctness and efficiency trade off, where interface design constrains implementation, where a decision made at one level has consequences at another. This understanding of interactions is perhaps the most valuable form of engineering knowledge. It is the knowledge that Segal's senior engineer was drawing on when he made architectural decisions with confidence — the knowledge that eroded when AI took over the implementation and the deposits of understanding stopped accumulating.
AI-generated code does not produce this understanding of interactions, because the interactions were never separated in the builder's mind. She described an outcome. The AI handled the interactions internally. The builder sees the result but not the trade-offs that produced it. She does not know what the AI sacrificed for performance, what assumptions it made about the interface, what corner cases it ignored in the interest of generating plausible code. The trade-offs are present in the code. They are invisible to the builder. And invisible trade-offs, like invisible detail in any abstraction, will eventually produce consequences that no one anticipated — because no one knew the trade-offs were there.
Dijkstra's prescription was not that programmers should avoid tools that help them manage complexity. His prescription was that every tool should preserve the programmer's ability to reason about what she has built. A compiler that translates high-level code into machine instructions is acceptable because the programmer can reason about the high-level code — she understands the concerns at her level, and she trusts (based on verification of the compiler) that the translation preserves correctness. An AI that translates natural language into executable code does not meet this standard, because the programmer cannot reason about the translation. She cannot verify that the AI addressed each concern correctly, because she cannot identify what concerns the AI addressed or how it separated them — if it separated them at all.
The separation of concerns was never merely about clean code. It was about the preservation of human understanding in the face of computational complexity. Every system built by human beings must eventually be maintained by human beings — debugged, updated, extended, adapted to new requirements. Maintenance requires understanding. Understanding requires that the system's logic be accessible to the human mind. The separation of concerns was the mechanism that made this accessibility possible, by ensuring that no single piece of the system was too complex for a human mind to hold. AI-generated code threatens to produce systems whose individual pieces are manageable but whose composition — the way the pieces fit together, the trade-offs between concerns, the hidden assumptions that span module boundaries — is accessible only to the process that generated it. And that process is a neural network that cannot explain itself.
Dijkstra used the word "elegance" in a way that most people misread. They hear an aesthetic judgment — a preference for pretty code, the programmer's equivalent of a well-set table. Dijkstra meant something more severe. Elegance, in his usage, was an epistemic property: the quality of a solution that is simple enough for its correctness to be visible. An elegant program is one you can look at and see why it works. An inelegant program is one that works but conceals its own logic. The first is correct for reasons that can be stated. The second is correct by accident — it works because its bugs happen not to be triggered by the inputs it has received. The elegant solution is trustworthy. The inelegant solution is a bet.
He expressed this with characteristic compression: "Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better." The observation cuts in two directions simultaneously. Simplicity is difficult because it requires the programmer to understand the problem deeply enough to find its essential structure — to strip away everything that is not necessary and keep only what is. And the market does not reward this difficulty, because the market cannot distinguish between a simple solution that is correct by construction and a complex solution that appears to work. The complex solution may even look more impressive, because complexity is often confused with sophistication.
This confusion has been present throughout the history of computing, but AI-generated code elevates it to a structural principle. Large language models are trained on vast corpora of existing code, and existing code is overwhelmingly complex — not because the problems it addresses are inherently complex, but because most programmers, working under time pressure and without the discipline Dijkstra advocated, produced solutions that were adequate rather than elegant. The training data encodes the profession's median practice, not its best practice. When an AI generates code, it generates code that resembles the code it was trained on — which is to say, code that is competent, conventional, and almost never elegant in Dijkstra's sense.
This is not a failure of the AI. It is a faithful reflection of the training data. The AI produces code that looks like the code that exists. The code that exists is, overwhelmingly, code that works without being understood — code whose correctness has been established by testing rather than by reasoning, whose structure reflects the accumulated decisions of programmers who were solving immediate problems rather than constructing provable solutions. The AI reproduces the profession's habits, including the habits that Dijkstra spent his career arguing against.
The consequence is a subtle but pervasive degradation of the standard against which code is judged. When the AI generates a function that is correct but complex — that works but cannot be understood without careful analysis — the builder has two options. She can accept it, because it works, and move on to the next task. Or she can reject it and ask for something simpler, which requires her to know what simpler would look like, which requires understanding the problem at a level that the AI was supposed to make unnecessary. The structural incentive is overwhelmingly toward acceptance. The code works. The deadline is real. The effort required to demand and evaluate a simpler solution is effort that produces no visible output. And in a culture that measures productivity by visible output — by features shipped, prototypes completed, tickets closed — invisible effort toward elegance is effort that is not merely unrewarded but actively penalized.
Dijkstra saw this cultural pressure clearly, and his response was to insist that elegance was not a luxury but a survival requirement. Complex code that works today will fail tomorrow, because complex code cannot be maintained. It cannot be maintained because it cannot be understood, and it cannot be understood because its logic is not visible. The failure will not announce itself. It will arrive as a subtle bug that manifests only under specific conditions — conditions that the original programmer did not foresee because she did not understand the code well enough to identify its boundary conditions. The maintainer will not be able to diagnose the bug because she, too, does not understand the code. She will patch it — add another layer of complexity to route around the failure — and the accumulated patches will produce a system that is correct for the known cases and fragile for everything else.
AI-generated code accelerates this cycle with extraordinary efficiency. Each regeneration — each time the builder asks the AI to fix a bug or add a feature — adds complexity without adding understanding. The AI does not simplify; it extends. It produces code that addresses the new requirement by building on the existing code, inheriting whatever complexity the existing code already contains and adding the complexity needed to accommodate the new behavior. The builder tests the result. It works. She ships. The system grows more complex with each iteration, and the complexity is of the worst kind: complexity that was generated rather than designed, that has no architect, that embodies trade-offs and assumptions that no human being chose or understands.
Dijkstra proposed an alternative that he sometimes called "stepwise refinement" — the practice of developing a program by starting with the simplest possible correct solution and adding complexity only when the requirements demanded it, and only in ways that preserved the visibility of the solution's correctness at every step. Each step was small enough to be verified. Each step preserved the properties that previous steps had established. The result was a program that was as simple as the problem permitted and no simpler — a program whose complexity was justified by necessity rather than accumulated by accident.
Stepwise refinement is precisely what AI code generation does not do. The AI does not start with the simplest correct solution. It starts with a plausible solution — one that resembles the solutions in its training data — and the plausible solution is almost never the simplest one. Plausibility and simplicity are different optimization targets. The plausible solution is the one that looks like code that exists. The simple solution is the one that reveals why the code is correct. These sometimes coincide. More often, they do not.
Segal writes about the "aesthetics of the smooth" in his engagement with Byung-Chul Han — the cultural preference for frictionless, seamless, polished surfaces that conceal their own construction. Dijkstra would have recognized this aesthetic immediately, because it is the aesthetic of AI-generated code: smooth on the surface, opaque beneath. Code that reads well, that follows conventions, that passes linting and style checks and code review — and that conceals, within its conventional surface, a logic that no one has examined for correctness. The smoothness is not evidence of quality. It is the mechanism by which the absence of quality is concealed.
Han argues that the smooth aesthetic produces a "hollowed-out parody of productivity." Dijkstra would have used different language — he would have said that the smooth aesthetic produces code that is "correct by coincidence" rather than "correct by construction" — but the diagnosis converges. Both thinkers identify the same pathology: the substitution of surface quality for structural quality, the preference for artifacts that look right over artifacts that are right, the cultural inability to distinguish between the two because the distinction requires precisely the depth of understanding that the smooth aesthetic is designed to make unnecessary.
The practical implication is this: in a world where AI generates the vast majority of code, the standard of code quality will converge on the standard of the training data — which is to say, on the median standard of the profession, which is to say, on code that works but is not understood. Elegance, in Dijkstra's sense, will become progressively rarer, not because it is impossible but because the tools that produce the code are not optimized for it and the people who use the tools are not trained to demand it. The market will not correct this drift, because the market cannot distinguish between elegant code and complex code until the complex code fails — and by the time it fails, the cost of replacing it with something elegant will be prohibitive, because the system will have accumulated too many dependencies, too many patches, too many layers of generated complexity that no one understands well enough to simplify.
Dijkstra insisted that elegance was not optional because he understood that the alternative to elegance is not merely ugliness but unreliability. Complex code fails. It fails in ways that cannot be predicted, diagnosed, or reliably repaired. It fails because its logic is not visible, and invisible logic is logic that cannot be verified. The insistence on elegance was not the aesthetic preference of a man who liked clean blackboards. It was the engineering judgment of a man who understood that the only reliable code is code that is simple enough to be understood — and that the only way to achieve such simplicity is to demand it, relentlessly, against the constant pressure of a culture that rewards speed over clarity and output over understanding.
In his 1988 essay "On the Cruelty of Really Teaching Computing Science," Dijkstra proposed that programming students should not be allowed near a computer until they had demonstrated mastery of formal reasoning. The proposal was received as provocation, which it was, but it was also a logical consequence of premises Dijkstra had been defending for decades. If programming is a mathematical discipline, then the prerequisites for programming are mathematical, not technological. Giving a student a computer before she can reason formally is like giving a medical student a scalpel before she can identify the organs — the tool amplifies whatever skill or lack of skill the user brings to it. The tool does not distinguish between competent use and incompetent use. Only the user's understanding makes that distinction, and understanding must precede access.
Segal argues, in The Orange Pill, that AI tools represent the most morally significant expansion of human capability since the invention of writing. The developer in Lagos who previously lacked the infrastructure to build software can now, through conversation with Claude, produce working applications. The marketing manager, the teacher, the architect — all can now build. The imagination-to-artifact ratio has collapsed. The gates have opened. Segal calls this democratization, and the word carries moral weight: who could argue against expanding who gets to build?
Dijkstra could. Not because he opposed human flourishing — the accusation of elitism, which was leveled at him throughout his career, misses the point of his argument as completely as the accusation of aestheticism misses the point of his insistence on elegance. He opposed the expansion of building capability without the corresponding expansion of verification capability, because he understood, with a clarity that decades of experience had sharpened, that the ability to build without the ability to verify is not empowerment. It is the distribution of a new and particularly dangerous form of ignorance.
The argument has a precise structure. When anyone can produce software through natural language description, the population of software producers expands dramatically. Some of these producers are skilled — they understand software well enough to evaluate what the AI generates, to identify its failure modes, to test it rigorously, and to maintain it over time. Many are not. They have the imagination to describe what they want. They do not have the expertise to evaluate what they receive. They test the output against their expectations, and when the output matches their expectations, they deploy. The testing is a test of their expectations, not a test of the code.
This is the specific failure mode that Dijkstra's framework predicts: when the builder does not understand the implementation, the builder's tests reflect the builder's understanding, and the builder's understanding may be — will be, in many cases — incomplete. The teacher who builds a curriculum tool through conversation with Claude tests whether the tool displays the right content. She does not test whether the tool correctly handles concurrent users, because she does not know what concurrent access is. She does not test whether the tool sanitizes input, because she does not know what injection attacks are. She does not test whether the tool correctly stores and retrieves student data under all conditions, because she does not understand the failure modes of the database operations the AI generated on her behalf. Her tests are a reflection of her knowledge. Her knowledge does not extend to the domains where the code is most likely to fail.
The response to this concern, in the discourse surrounding AI tools, is typically that the AI handles these concerns — that the generated code incorporates best practices for security, concurrency, and data integrity because the training data includes code that addresses these concerns. This response illustrates exactly the dependency that Dijkstra spent his career warning against: trust in the tool as a substitute for understanding. The AI may handle these concerns. It may not. The builder cannot tell the difference, because telling the difference requires the expertise the AI was supposed to replace. She is in the position of a patient evaluating her own surgery — she can report whether she feels better, but she cannot assess whether the procedure was performed correctly, because the assessment requires medical knowledge she does not possess.
Dijkstra observed, in a different but related context, that "the competent programmer is fully aware of the strictly limited size of his own skull; he therefore approaches the programming task in full humility." The humility he described was not self-deprecation. It was the specific intellectual virtue of knowing what you do not know and acting accordingly. The humble programmer builds conservatively because she recognizes the limits of her own understanding. She verifies carefully because she does not trust her intuition to substitute for proof. She keeps her code simple because she knows that complexity will exceed her ability to manage it.
The AI-empowered builder who lacks programming expertise cannot exercise this humility, because the humility requires the very knowledge that the tool has made unnecessary. You cannot be humbly aware of your limited understanding of concurrency if you do not know what concurrency is. You cannot be cautious about data integrity if you do not know what data integrity means. The humility that Dijkstra demanded was a function of expertise — the more you understood about programming, the more clearly you saw the limits of your understanding, and the more carefully you worked within those limits. Remove the expertise, and you remove the basis for humility. What remains is not confidence. It is the absence of the knowledge required to doubt.
Segal acknowledges, with characteristic honesty, that AI's democratization is "real but partial." Access requires connectivity, hardware, English fluency, and the cost of inference. These are real barriers, and their reduction is genuinely significant. But Dijkstra's concern is not about access barriers. It is about the quality of what is produced when the barriers are removed. The developer in Lagos who gains access to Claude gains the power to create. Does she gain the power to verify? Does she gain the understanding necessary to evaluate whether her creation is correct, secure, and reliable? If not — and the natural language interface specifically does not provide this understanding — then she has gained the power to produce artifacts whose quality she cannot assess.
The historical parallel is the printing press, which Segal invokes. Gutenberg's press democratized the production of books. The scholars worried that cheap books would flood the market with nonsense, and they were right — the nonsense did come. But the resolution, as Segal notes, was not less production but better judgment: criticism, curation, editorial standards, the institutions that separated valuable work from noise. The parallel is instructive, but it is also incomplete in a way that Dijkstra's framework makes visible. A bad book is annoying. Bad software can be dangerous. The pamphleteer who prints nonsense wastes the reader's time. The builder who deploys unverified software may corrupt data, expose private information, produce incorrect results that are acted upon as correct, or create security vulnerabilities that are exploited by adversaries. The consequences of democratized production scale with the consequentiality of the product, and software, which increasingly mediates every aspect of human life, is consequential in a way that pamphlets are not.
Dijkstra was not opposed to education. He was the most dedicated of teachers, spending decades at Eindhoven and then at the University of Texas at Austin training students in the discipline of formal reasoning. What he opposed was the substitution of tools for understanding — the belief that giving someone a powerful instrument was equivalent to giving them the competence to use it responsibly. The scalpel metaphor was not accidental. A scalpel in the hands of a trained surgeon saves lives. The same scalpel in the hands of someone who has watched surgery on television is not empowerment. It is hazard.
Segal argues that the question is no longer "What can you do?" but "What is worth doing?" — that AI has shifted the premium from execution to judgment. Dijkstra would have endorsed the direction of this argument while challenging its completeness. Judgment about what is worth building is indeed the higher-order skill. But judgment about whether what was built is correct is not a lower-order skill that can be safely delegated to the tool. It is a different kind of judgment, requiring a different kind of expertise, and its absence from the democratized builder's toolkit is not a gap that will be filled by the same AI that created it. The tool that generates the code cannot also be the sole authority on whether the code is correct, for the same reason that the defendant cannot also be the judge: the evaluation must be independent of the process being evaluated.
What Dijkstra would have demanded, in the face of democratized code generation, is democratized code verification — tools, training, and institutional structures that give the new builders not just the power to create but the capacity to evaluate what they have created. Whether such verification can be meaningfully democratized — whether the knowledge required to evaluate code can be made as accessible as the tools that generate it — is an open question that neither Dijkstra's framework nor Segal's optimism has answered. But it is the question upon which the difference between genuine democratization and the mass production of unverified artifacts ultimately depends.
Dijkstra shared with his colleague C.A.R. Hoare a formulation that distills decades of experience into a single observation: "There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies." The formulation appears symmetrical. It is not. The first approach — simplicity so radical that correctness is visible — requires deep understanding of the problem, ruthless elimination of unnecessary complexity, and the discipline to prefer a smaller, provably correct system over a larger, plausibly correct one. The second approach — complexity that conceals its own failures — requires nothing more than the accumulation of code, feature upon feature, patch upon patch, until the system is too large and too tangled for anyone to identify what is wrong with it. The first approach is hard. The second is easy. The profession has, throughout its history, overwhelmingly chosen the second.
Dijkstra understood why. Simplicity requires saying no. It requires looking at a feature request and asking whether the feature is necessary or merely desired. It requires looking at an implementation and asking whether a simpler one exists. It requires the willingness to discard working code — code that passes tests, code that satisfies stakeholders — because the code is more complex than the problem demands. Each of these decisions is a local cost: the feature is not built, the deadline is extended, the stakeholder is disappointed. The benefit — a system that is simple enough to be understood, verified, and maintained — is a global benefit that accrues over the long term and is invisible at the moment of decision. Markets do not reward invisible long-term benefits. They reward visible short-term features. The structural incentive is toward complexity, and the structural incentive has won at every scale of the profession, from the individual function to the enterprise system.
AI-generated code does not resist this incentive. It accelerates it. When generating code costs effectively nothing — when a natural language description produces a working implementation in seconds — the cost of adding complexity drops to zero. A feature that would have required a week of engineering effort, a week during which someone might have asked "Is this feature necessary?", now requires a sentence. The question of necessity is not asked, because the cost of implementation no longer imposes the friction that made the question relevant. The friction of implementation was, in Dijkstra's terms, an inadvertent but valuable form of quality control. It forced trade-offs. It required prioritization. It ensured that only the features worth the engineering investment were built. Remove the friction, and you remove the trade-off, and you build everything — not because everything is necessary, but because everything is possible and nothing imposes the cost that would distinguish the necessary from the merely possible.
The result is systems of a complexity that exceeds human understanding not because the problem is inherently complex but because no one said no. The Napster Station that Segal describes — built in thirty days, integrating AI-powered conversational interfaces, face detection, speaker detection, music generation, and multi-language support — is presented as a triumph of capability expansion. And it is. The question Dijkstra's framework raises is not whether the system works but whether anyone understands it completely enough to maintain it. When the system fails — and every system of sufficient complexity eventually fails — will the maintainers be able to diagnose the failure? Will they be able to trace it to its source, identify the assumption that was violated, and correct the underlying logic? Or will they ask the AI to generate a fix, which will add another layer of generated complexity to a system that was already beyond human comprehension?
This is not a speculative concern. It is the maintenance problem that has plagued the software industry since its inception, now accelerated by an order of magnitude. Legacy systems — systems that work but that no one understands — are already the single largest category of technical debt in the global software industry. Banks run on COBOL code written decades ago by programmers who are now retired or dead. Government agencies depend on systems whose original architects are unreachable and whose documentation, if it ever existed, is incomplete. The maintenance cost of these systems is staggering, and the cost exists precisely because the systems are too complex for their current maintainers to understand.
AI-generated code will produce the next generation of legacy systems in a fraction of the time. A system that took a team of twenty programmers five years to build — accumulating complexity gradually enough that the programmers, at least at the time of construction, understood what they had built — can now be generated in weeks by a single builder who understands the requirements but not the implementation. The builder ships the system. It works. The builder moves on to the next project. A year later, the system needs modification. The original builder — who never understood the implementation — cannot modify it. A new developer — who was not present for the generation and has no documentation beyond the original natural language description — must reverse-engineer a codebase she did not write and that was not written by a human being. The codebase was generated by a process whose logic she cannot reconstruct, and it embodies assumptions and trade-offs that were never made explicit because they were never made by a human mind. She can read the code, if she has the skill. But reading AI-generated code is not the same as reading human-written code, because human-written code (at its best) reflects the structure of a human mind reasoning about a problem, and AI-generated code reflects the statistical patterns of a training corpus. The logic is emergent rather than designed. The structure is conventional rather than intentional. The trade-offs are implicit rather than documented.
Dijkstra warned about exactly this trajectory in the context of human-written code. His argument was that complexity is the enemy of reliability, and that the only defense against complexity is the discipline to keep systems simple. He would have observed, with the cold precision that characterized his most devastating analyses, that AI has not solved the complexity problem. It has solved the generation problem. These are different problems, and the solution to the second exacerbates the first. Generating code faster means accumulating complexity faster. Accumulating complexity faster means reaching the threshold of incomprehensibility faster. And once a system crosses that threshold — once no human being can hold its logic in their mind, trace its execution, or identify its failure modes — the system becomes unmaintainable in any meaningful sense. It can be patched. It can be regenerated. It cannot be understood.
There is a deeper issue here that connects Dijkstra's concern about complexity to the formal impossibility results described in Chapter 4. The 2026 paper proving that no verification procedure can simultaneously achieve soundness, generality, and tractability has a practical corollary: as systems grow more complex, the gap between what can be tested and what needs to be verified grows wider. Testing covers a finite subset of behaviors. Verification covers all behaviors but becomes intractable for sufficiently complex systems. The sweet spot — systems simple enough to be verified but complex enough to be useful — is precisely the zone that Dijkstra spent his career trying to inhabit and that AI-generated code systematically overshoots. Not because the AI is incapable of generating simple code, but because the builders who direct the AI have no incentive to demand simplicity when complexity is free.
Segal writes about the "ascending friction" that replaces implementation friction — the harder, higher-level questions of what to build and for whom. Dijkstra's framework adds a correction: ascending friction accounts for the strategic level but leaves the implementation level unattended, and the implementation level is where complexity accumulates. The builder who exercises excellent strategic judgment — who asks the right questions about what to build — may still deploy a system whose implementation is too complex to be maintained, because the strategic questions and the implementation questions operate at different levels, and the tool that connects them (the AI) generates implementations whose complexity the builder has no mechanism to control.
The prescription that follows from Dijkstra's analysis is not that systems should be small. It is that complexity should be justified. Every feature, every module, every line of code should earn its place by being necessary — not merely desired, not merely possible, but necessary for the system to satisfy its requirements. This standard was difficult to maintain when adding a feature required a week of engineering effort. It is nearly impossible to maintain when adding a feature requires a sentence. But the difficulty of maintaining the standard does not reduce the consequences of abandoning it. The consequences are the same as they have always been: systems that work until they do not, that cannot be diagnosed when they fail, and that accumulate complexity until the cost of maintaining them exceeds the cost of replacing them — at which point they are replaced by new systems that begin the cycle again, now with AI generating the complexity at a pace that guarantees the cycle will be shorter and the accumulated incomprehensibility greater.
Dijkstra wrote, near the end of his career, that "the question of whether a computer can think is no more interesting than the question of whether a submarine can swim." The observation was aimed at the artificial intelligence community, but its logic applies more broadly. A submarine is not a fish. It does not need to swim in the way a fish swims. It needs to travel underwater, and the engineering solution to underwater travel need not mimic the biological solution. Similarly, a computer is not a mind. It does not need to think in the way a mind thinks. It needs to process information, and the engineering solution need not mimic the cognitive solution. The insistence on mimicry — on making the machine resemble the mind — produces, in Dijkstra's view, machines that are worse at being machines without becoming better at being minds. The same logic applies to complexity: a system that mimics the complexity of human thought without the human mind's ability to manage that complexity is not a sophisticated system. It is an unmanageable one.
Dijkstra stated the point with a precision that left no room for misunderstanding: "Program testing can be used to show the presence of bugs, but never to show their absence." The sentence has been quoted so frequently that it has acquired the patina of a platitude — something everyone nods at and no one acts on. But it is not a platitude. It is a theorem. Its truth is not empirical but logical, and no amount of improved testing methodology will make it false. A finite number of tests applied to a program that accepts an effectively infinite number of inputs can establish, at most, that the program behaves correctly for the tested inputs. The untested inputs remain. Any one of them might trigger a failure that no tested input reveals. The only way to establish that a program is correct for all possible inputs is to reason about the program's logic — to prove, from the structure of the code, that its behavior satisfies its specification universally. This is verification. Everything else is observation.
The distinction between testing and verification is the distinction between evidence and proof. Evidence can be voluminous and persuasive and wrong. A thousand successful test executions do not establish that the thousand-and-first will succeed. A million successful executions do not establish it either. The number is irrelevant, because the logical structure of the claim is not affected by the size of the sample. Testing is induction: generalizing from observed cases to unobserved cases. Verification is deduction: deriving the conclusion from the premises with logical necessity. Induction can fail. Deduction cannot — provided the premises are true and the reasoning is valid.
The entire software industry runs on induction. Programs are tested, not verified. They are shipped when the tests pass, not when correctness has been demonstrated. This is not a failure of principle. It is a concession to practice — formal verification of real-world systems is extraordinarily difficult, often more difficult than building the systems themselves, and the market does not reward the additional effort. Dijkstra knew this. He did not deny the practical difficulty. He denied that practical difficulty excused intellectual surrender. The fact that verification is hard does not make testing sufficient. It makes the gap between what the industry does and what reliability requires a permanent, structural risk — a risk managed by accumulated testing rather than eliminated by proof.
AI-generated code widens this gap to a degree that Dijkstra's framework makes quantifiable. Consider the relationship between testing quality and implementation understanding. The most revealing test cases — the ones that expose subtle failures, that probe boundary conditions, that exercise the specific logic paths where correctness is most difficult to achieve — are designed by people who understand the implementation. A tester who knows that a sorting algorithm uses a particular partitioning strategy can design tests that exercise the partition's edge cases: equal elements, already-sorted inputs, inputs that trigger worst-case behavior. A tester who does not know the implementation strategy — who knows only that the function is supposed to sort — will test obvious cases: unsorted input, reverse-sorted input, empty input. These tests are necessary. They are not sufficient. The bugs that survive them are the bugs that live in the implementation's specific logic, and finding those bugs requires knowing the implementation.
When the builder does not understand the implementation — when the code was generated by an AI and the builder tested the output against expected behavior — the tests reflect the builder's understanding of the requirements, not her understanding of the code. The tests answer the question "Does it do what I wanted?" They do not answer the question "Does it do what I wanted under all conditions, including conditions I did not think to test?" The second question can only be answered by someone who understands the code well enough to identify the conditions that the requirements do not specify — the implicit assumptions, the unhandled edge cases, the interactions between components that are not visible in the requirements document.
This is not a theoretical concern. The 2024 Purdue study that found ChatGPT's programming answers were incorrect fifty-two percent of the time — while users preferred those answers for their fluency — illustrates the mechanism with uncomfortable precision. The errors were not obvious. They were concealed by the quality of the presentation. The code looked correct. It read correctly. It was structured correctly. But its logic was wrong in ways that required domain expertise to detect. A user testing this code against expected behavior would, in many cases, confirm that it produced the right output for the tested inputs. The error would manifest only for inputs the user did not think to test — inputs that a programmer who understood the implementation would have tested as a matter of course.
Segal describes his own encounter with this failure mode: the Deleuze passage in which Claude produced a connection that was "eloquent, well-structured, hitting all the right notes" — and substantively wrong. He caught the error because he checked against the primary source. But the error's structure is generalizable. The output passed every surface test: it was coherent, it was relevant, it was stylistically appropriate, and it was wrong. The surface tests — coherence, relevance, style — are the tests that the builder without domain expertise can apply. The depth test — is the claim actually correct? — requires the very expertise that the tool was supposed to make unnecessary. The testing capability and the generation capability are mismatched: the tool generates output at a level of sophistication that the builder cannot verify, and the gap between generation quality and verification capability is the gap where errors live.
Dijkstra proposed formal verification as the alternative to this fragile arrangement, and his proposal has experienced a surprising resurgence in the context of AI safety research. The "Guaranteed Safe AI" framework described by Dalrymple and colleagues is essentially Dijkstrian in spirit: it proposes a world model, a safety specification, and a verifier that produces an auditable proof certificate that the AI system satisfies the specification relative to the model. The architecture is Dijkstra's architecture, applied to a new domain. But the 2026 impossibility result — the proof that soundness, generality, and tractability cannot be simultaneously achieved — means that even this Dijkstrian approach cannot fully bridge the gap for systems of sufficient complexity. The verification trilemma is a mathematical ceiling. Practical verification can achieve two of the three properties. It cannot achieve all three.
The implication is not that verification should be abandoned. It is that verification will always be incomplete for the most complex systems, and the incompleteness must be managed rather than denied. Managed verification — verification that is explicit about what it covers and what it does not, that quantifies the residual risk, that directs testing effort toward the specific gaps in the verification — is vastly more reliable than testing alone. But it requires expertise that the democratized builder does not possess and that the natural language interface does not provide.
Segal writes about the Berkeley researchers' proposal for "AI Practice" — structured pauses, sequenced workflows, protected time for human-only engagement. Dijkstra's framework suggests a complementary and more specific practice: verification intervals. Periods in the development process dedicated not to building or testing but to reading the generated code, tracing its logic, identifying its assumptions, and determining the conditions under which it will fail. This practice is not testing — it does not involve executing the code. It is the practice of understanding the code at a level sufficient to design tests that target the code's actual vulnerabilities rather than the builder's assumptions about what those vulnerabilities might be.
Whether this practice will be adopted is, again, a question that the structural incentives of the market will answer. The practice is slow. It requires expertise. It produces no visible output. It competes directly with the next feature, the next sprint, the next prompt. Every incentive in the current system pushes against it. And every incentive in the current system pushes the builder toward the conclusion that testing is enough — that if the code passes the tests, the code is correct.
Dijkstra spent fifty years explaining why this conclusion is false. The conclusion is no less false now than it was when he first articulated it. The only thing that has changed is the scale at which the consequences of its falsity will be felt.
In the spring of 1985, sitting in Austin, Texas, Dijkstra delivered a remark that distills his entire relationship to artificial intelligence into three sentences: "I feel that the effort to use machines to try to mimic human reasoning is both foolish and dangerous. It is foolish because if you look at human reasoning as is, it is pretty lousy; even the most trained mathematicians are amateur thinkers. Instead of trying to imitate what we are good at, I think it is much more fascinating to investigate what we are poor at."
The statement is characteristically provocative, but the provocation conceals a serious argument. Dijkstra was not dismissing human intelligence. He was pointing out that human reasoning, even at its best, is unreliable — prone to bias, distraction, overconfidence, and the systematic errors that cognitive science has spent the last half-century cataloging. The ambition of artificial intelligence, as Dijkstra understood it, was to reproduce this unreliable reasoning in a machine. His counter-proposal was the opposite: use the machine's strengths — speed, consistency, exhaustive enumeration — to compensate for human weaknesses. Make the machine what it is good at being. Do not force it into an imitation of what humans already do badly.
The current generation of AI tools has, by this standard, done precisely what Dijkstra warned against. Large language models mimic human language production with extraordinary fluency. They reproduce the patterns of human reasoning — including the patterns of human error. When Claude generates a plausible but incorrect philosophical reference, it is doing exactly what a human writer under time pressure might do: producing something that sounds right because it follows the statistical patterns of correctness, without the formal verification that would distinguish genuine correctness from its imitation. The machine is mimicking human reasoning at scale, and the mimicry reproduces human failure modes at scale. This is, in Dijkstra's precise formulation, "both foolish and dangerous."
But Dijkstra would not have stopped at the diagnosis. Diagnosis without prescription was, in his view, an intellectual failure — the identification of a problem without the discipline to propose a solution. And his entire career, properly understood, contains the outlines of a prescription for the current moment. The prescription is not that AI-augmented building should be abandoned. It is that AI-augmented building should be subjected to the discipline that all programming, in Dijkstra's framework, requires — and that the discipline must be adapted to the specific new challenges that generation creates.
The first element of the prescription is the restoration of verification as a practice distinct from testing. Every builder who generates code through AI must develop — or must have access to — the capacity to verify the generated output. This does not mean that the builder must be able to write the code from scratch. It means she must be able to read it critically, trace its logic, identify its assumptions, and determine the conditions under which it will fail. This capacity is not identical to programming expertise, but it is not independent of it either. It is a form of literacy — the ability to read and evaluate code, even if one cannot write it — that the current generation of builders is not being trained to develop.
The educational implications are direct and urgent. The response to AI-generated code has, in many institutions, been to reduce programming instruction on the grounds that AI makes programming skill unnecessary. Dijkstra's framework suggests the opposite response: that AI makes a specific form of programming literacy more necessary, not less. The literacy in question is not the ability to write code but the ability to read it — to evaluate generated output with the critical judgment that the generation process does not provide. A curriculum that teaches students to prompt effectively without teaching them to evaluate the output critically is, in Dijkstra's terms, a curriculum that produces fluent illiterates: people who can generate text they cannot read.
The second element is the preservation of simplicity as a design criterion. When generation is free, the only discipline that prevents unbounded complexity is the builder's willingness to demand simplicity — to reject generated code that is more complex than the problem requires, to insist on implementations whose logic is visible, to prefer the solution whose correctness can be demonstrated over the solution whose correctness can only be tested. This discipline requires understanding what simplicity means in a given context, which requires understanding the problem at a level that natural language description alone does not provide. The builder must know enough about the domain to recognize when a solution is unnecessarily complex, which means the builder must possess domain knowledge that goes beyond the ability to describe desired outcomes.
The third element is the maintenance of the separation of concerns as an intellectual practice, not merely an organizational one. When a builder directs an AI to generate a system, she must be able to decompose the system's requirements into independently verifiable concerns and evaluate the generated output against each concern separately. This practice requires the builder to think about the system's logic in the structured, layered way that Dijkstra advocated — to address correctness separately from performance, interface separately from implementation, security separately from functionality. The AI generates everything at once. The builder must disaggregate the output and evaluate each concern on its own terms.
The fourth element, and the most demanding, is intellectual honesty about the limits of the current arrangement. The builder who generates code through AI and deploys it on the basis of testing alone must acknowledge, to herself and to the users of her system, that the code is tested but not verified — that it has been observed to work but has not been demonstrated to be correct. This acknowledgment does not require abandoning AI-augmented building. It requires abandoning the pretense that tested code and verified code provide the same assurance. They do not. They have never provided the same assurance. And the pretense that they do — the quiet elision of the distinction between "it works" and "it is correct" — is the specific intellectual dishonesty that Dijkstra spent his career opposing.
These four elements — verification literacy, simplicity discipline, separation of concerns, intellectual honesty — do not constitute a complete framework for responsible AI-augmented building. But they constitute the minimum that Dijkstra's principles require, and they provide a foundation on which a more complete framework could be built. The foundation is not technological. It is intellectual. It rests on the conviction that the builder's understanding of what she has built is not a luxury to be discarded when tools make it unnecessary but a requirement that becomes more important as tools become more powerful — because powerful tools amplify not only capability but also error, and the only defense against amplified error is amplified understanding.
Dijkstra died in 2002, two decades before the moment The Orange Pill describes. He did not see Claude Code or natural language programming or the collapse of the imagination-to-artifact ratio. But he saw, with the clarity of a man who had spent fifty years studying the relationship between human understanding and computational systems, that the trajectory of the profession was toward ever-greater power coupled with ever-lesser comprehension. His career was a sustained argument that this trajectory was unsustainable — that power without understanding is not capability but liability, that speed without verification is not productivity but recklessness, and that the discipline of the mind is not a constraint on building but the foundation on which all reliable building depends.
The tools have changed beyond anything Dijkstra could have anticipated. The discipline he advocated has not changed at all. It cannot change, because it is grounded not in the specifics of any particular technology but in the logical structure of the relationship between human understanding and the artifacts humans produce. That relationship has the same structure now as it had in 1968 when Dijkstra wrote his letter about "go to" statements: the builder must understand what she has built, or she does not know whether it is correct, and if she does not know whether it is correct, she is deploying a bet, not a system.
The tools we use have a profound and devious influence on our thinking habits, and therefore on our thinking abilities. This was Dijkstra's warning. The tools have become more powerful than he imagined. The warning has become proportionally more urgent. And the discipline he spent his life advocating — the discipline of thinking clearly about what we build, of proving what we claim, of preferring understanding to speed and correctness to convenience — is not a relic of an earlier era. It is the only thing standing between the extraordinary capability that AI provides and the extraordinary consequences of deploying that capability without the intellectual foundation to ensure it is used well.
The question is whether the profession — and the broader culture that depends on the profession's work — will adopt the discipline before the consequences of its absence become catastrophic. Dijkstra's career suggests that the answer is probably no. The discipline has been available for fifty years. The profession has preferred speed. But the fact that a discipline is unlikely to be adopted does not make it wrong. It makes the prediction of what will happen without it more certain. And certainty, in Dijkstra's framework, is the one thing that should never be confused with comfort.
The proof I cannot write is the one that matters most.
I do not mean that metaphorically. Dijkstra's central demand — that the builder should be able to demonstrate, through formal reasoning, that what she has built is correct — is a demand I cannot meet for the very book you have just read. The Orange Pill was written with Claude. The chapters were shaped through conversation, the arguments refined through iteration, the connections discovered through a process that I directed but did not fully control. I know what I intended. I believe the arguments are sound. I cannot prove it in the way Dijkstra would have required proof — by tracing the logic from premises to conclusion with the certainty that every step is valid and no step has been skipped.
That admission is the point.
Every chapter of this volume has been, in a sense, an argument with my own practice. Dijkstra's framework does not offer the comfort of easy reconciliation. When he says that testing shows the presence of bugs but never their absence, I hear a direct challenge to the way I built Napster Station — thirty days, tested and shipped, exhilarating and unverified. When he says the competent programmer is fully aware of the limited size of her own skull, I think of the engineer in Trivandrum whose architectural confidence eroded after months of delegating implementation to Claude, and I recognize that the erosion was not a personal failing but the predictable consequence of a structural change that I celebrated. When he insists that elegance is not aesthetic preference but epistemic necessity — that simple code is trustworthy code and complex code is a bet — I look at the systems my team has generated and wonder how many hidden bets we are carrying.
What makes Dijkstra's critique so uncomfortable is that it does not oppose what I believe. It sharpens it. I wrote in The Orange Pill that AI is an amplifier. Dijkstra's response is precise: an amplifier amplifies the signal and the noise, and if you cannot distinguish between them — if you cannot verify which parts of your amplified output are correct and which are plausible imitations of correctness — then you do not know what you have amplified. The discipline he demands is not the opposite of the creative liberation I described. It is its prerequisite. Amplification without verification is not power. It is volume.
The hardest sentence in this volume, the one I keep circling back to, is Dijkstra's observation that the tools we use have a profound and devious influence on our thinking habits. Devious. Not merely profound — devious. The influence is hidden. It operates below awareness. You do not notice the tool reshaping your thought until the thought has already been reshaped. I have felt this. I have felt the pull toward accepting Claude's output because it sounds right, the gradual atrophying of the instinct to verify, the seductive ease of the smooth surface that conceals whatever lies beneath.
Dijkstra was not a man who offered reassurance, and this volume does not offer it either. What it offers is something more useful: a standard. A standard that says the exhilaration of building is real but insufficient. That working code is not correct code. That the gap between them is where the consequences live. That the discipline of understanding what you build is not a tax on creativity but the foundation that makes creativity responsible.
My children will build with tools more powerful than Claude. The question Dijkstra forces me to ask is not whether they will have the power to build — they will — but whether they will have the discipline to understand what they have built. Whether anyone will teach them that the proof matters more than the prototype. Whether the culture they inherit will value verification as highly as it values speed.
I do not know the answer. But I know the question is the right one. And Dijkstra taught me that asking the right question, with enough precision to know what would count as an answer, is where the discipline begins.
-- Edo Segal
In The Orange Pill, Edo Segal described the exhilaration of building with AI -- the collapse of the imagination-to-artifact ratio, the twenty-fold productivity leap, the feeling of creative liberation when machines meet you in your own language. The celebration was real. So was the vertigo.
Edsger Dijkstra spent fifty years asking the question that vertigo obscures: Does the builder understand what she has built? His framework -- structured reasoning, provable correctness, the discipline of simplicity -- was forged in an era of punch cards and blackboards. It has never been more relevant than now, when AI generates code faster than humans can read it and "it passed the tests" has become the industry's substitute for "it is correct."
This volume applies Dijkstra's uncompromising rigor to the landscape of AI-augmented building. The result is not a rejection of the tools but a demand that the builders be worthy of them -- that speed be matched by understanding, that generation be matched by verification, and that the discipline of the mind remain the foundation on which everything reliable is built.
-- Edsger W. Dijkstra

A reading-companion catalog of the 18 Orange Pill Wiki entries linked from this book — the people, ideas, works, and events that Edsger Dijkstra — On AI uses as stepping stones for thinking through the AI revolution.
Open the Wiki Companion →