AI detection software — products like Turnitin's AI writing detection, GPTZero, Originality.AI, and dozens of competitors — purported to distinguish human-written text from AI-generated text with high accuracy. The tools were deployed across thousands of educational institutions between 2023 and 2026. The documented reality was that the detection was unreliable, systematically biased against non-native English speakers and students with unusually formal writing styles, and produced false accusations that subjected students to humiliating administrative processes based on flawed algorithmic assessment. The software became Skenazy's paradigmatic case of institutional safetyism in the AI age: a protection whose harms exceeded the risk it addressed, deployed for institutional liability rather than student welfare.
There is a parallel reading that begins from institutional necessity rather than student rights. The problem with AI detection software is not that it was deployed, but that it failed — and its failure has left educational institutions with no enforcement mechanism at the precise moment when one was critically needed.
Consider the counterfactual: what happens when institutions abandon detection entirely? The Skenazy answer is assessment reform, but this assumes a pace of institutional change that does not match the speed of AI capability improvement. Between 2023 and 2026, thousands of institutions attempted assessment reform. Most failed. Faculty lack training in assignment redesign. Departments lack consensus on what constitutes legitimate AI use. Meanwhile, students with access to increasingly capable AI tools face a coordination problem: individual honor operates against collective defection. The tragedy is not overprotection but the opposite — the collapse of any shared standards about what constitutes one's own work. AI detection software, for all its flaws, maintained institutional commitment to a distinction that matters: between thinking and outsourcing thought. Its removal does not produce better assessment; it produces the normalization of unacknowledged AI use as the background condition of all student work. The question is not whether detection software harmed specific students — it clearly did — but whether its absence produces a larger harm: the elimination of any institutional capacity to maintain standards about intellectual ownership. The students most hurt by this elimination are not the false positives but the students from under-resourced backgrounds who lack the AI access and prompt engineering skills that have become the new prerequisite for academic success.
The technical problem with AI detection is structural rather than contingent. Detection tools work by measuring statistical properties of text — perplexity, burstiness, token distribution patterns — that are associated with AI generation in training data. The measures are noisy. Human writers who produce unusually regular, syntactically precise, or low-perplexity text trigger false positives. This population is not random. It is systematically composed of non-native English speakers (whose English was learned with deliberate attention to grammatical regularity), students with unusually academic writing styles (often from families with academic backgrounds), and students with certain neurodivergent profiles. The tools were effectively profiling the students most likely to have carefully crafted their prose and flagging them as cheaters.
The institutional response to the tools' unreliability was revealing. Most schools that deployed AI detection did not pair it with procedures that acknowledged its error rate. Students flagged by the software were subjected to administrative processes — required handwritten drafts, mandatory interviews, temporary suspension — that assumed the detection was approximately correct. The burden was placed on students to prove their innocence, with institutional resources weighted against them. For students from disadvantaged backgrounds, the process was particularly harsh: the students with the fewest resources to contest the accusation were the students most likely to be falsely accused.
The framework parallel to Skenazy's physical-world documentation is exact. The tools functioned as institutional safety theater — visible action that demonstrated the institution's seriousness about AI, regardless of whether the action produced the outcomes it claimed. No administrator was fired for deploying AI detection software; many were praised for taking the issue seriously. The incentive structure rewarded deployment regardless of outcome. Meanwhile, the students who bore the cost of the tools' failures had no institutional voice comparable to the parents and advocates who had fought the overprotection battles in physical-world contexts.
The Skenazy response was characteristically direct. The problem was not that schools were trying to address AI use in student work. The problem was that they had chosen a response that substituted algorithmic theater for the harder work of reforming assessment — shifting from output evaluation to process evaluation, asking students questions instead of running their writing through software, designing assignments that could not be completed through uncritical AI use in the first place. The detection software was a lazy institutional answer to a problem that required harder institutional thinking.
AI detection software proliferated rapidly after the public release of ChatGPT in November 2022. By early 2023, major educational technology vendors had deployed detection products. Documentation of the tools' unreliability began appearing in academic papers and journalism within months, with the bias against non-native English speakers being the most extensively studied failure mode.
Algorithmic bias as systematic harm. Detection tools systematically misidentify writing from non-native English speakers and students with unusually formal styles as AI-generated.
Institutional liability over student welfare. The tools' deployment was driven by institutional risk management rather than evidence of efficacy or student benefit.
Safety theater diagnosis. The tools performed the appearance of addressing AI in schools while producing harms that exceeded the problem they addressed.
Process over detection. The Skenazy alternative is assessment reform — grading questions rather than essays, designing assignments resistant to uncritical AI use, treating AI engagement as subject to educational scaffolding rather than algorithmic surveillance.
Defenders of AI detection argue that some tool is needed to preserve academic integrity, and that the alternatives (assessment reform, honor codes) are either inadequate or require resources most institutions cannot muster. The Skenazy-adjacent response is that flawed tools are worse than imperfect alternatives, because the tools produce specific harms to specific students while failing at their stated purpose.
The technical case against AI detection software is decisive: the tools were unreliable (80% right), systematically biased (90% right), and deployed without adequate procedural protections (100% right). The Skenazy diagnosis of safety theater is accurate when the question is "did these specific tools work as advertised?" They did not. The harms to falsely accused students were real, measurable, and fell along predictable demographic lines.
But the institutional necessity claim shifts the question to "what maintains academic standards when detection fails?" Here the weighting reverses. Assessment reform is the right answer in principle (100% right as goal), but institutions' actual capacity to execute it at speed is limited (perhaps 40% adequate in practice by 2026). The gap between ideal and implementation is not Skenazy's concern — her framework addresses the harms of the tool that exists, not the harms of the tool's absence. But for institutions, the gap is the whole problem. The choice is not between flawed detection and good assessment; it is between flawed detection and no mechanism at all.
The synthesis the topic requires is temporal: AI detection software was the wrong tool, correctly abandoned, whose failure clarifies the real work. That work is not choosing between detection and reform but building institutional capacity for a new baseline: assignments that engage AI as material rather than forbid it as contamination, assessment that measures intellectual process rather than textual output, and standards that distinguish between AI use and AI dependence. Detection's collapse was necessary. What comes next is the actual question, and neither Skenazy's critique nor the institutional defense answer it. The answer is being built, slowly, by the faculty willing to redesign their courses from scratch.