CONCEPT

The Joel Test and Its AI-Era Successor

Spolsky's 2000 twelve-question checklist for evaluating software team quality — blunt, binary, deliberately oversimplified — and the five-question AI-era successor this volume proposes, designed to make visible the diagnostic capability that daily productivity metrics conceal.

In August 2000, Joel Spolsky published 'The Joel Test: 12 Steps to Better Code,' a deliberately oversimplified checklist that allowed any software team to evaluate itself in five minutes on twelve binary questions: do you use source control? can you make a build in one step? do you fix bugs before writing new code? The test's power was its accessibility — it required no specialized knowledge to administer, produced a score out of twelve that was immediately legible, and created a shared vocabulary that teams could use to identify and argue for improvements. Over two decades it became one of the most cited heuristics in software engineering. This volume proposes a five-question AI-era successor, designed not to evaluate general team quality but to measure the specific capability that AI-generated code erodes and that only deliberate practice preserves.

*The Joel Test and Its AI-Era Successor*

In The You On AI Field Guide

The original Joel Test asked whether a team had source control, one-step builds, daily builds, a bug database, fixing bugs before writing new code, an up-to-date schedule, a spec, quiet working conditions, the best tools money can buy, testers, hallway usability tests, and candidates who write code during interviews. The questions were binary. The total score was a team-quality index. The test's cultural impact exceeded its precise measurement properties because it gave teams a ready-made instrument for self-examination that required neither external auditors nor sophisticated methodology.

The AI-era successor this volume proposes asks five questions. First: can anyone on the team read and explain the AI-generated code at the implementation level — not what it does, but how and why the implementation makes the specific choices it makes? Second: has the team diagnosed a leak in AI-generated code in the last quarter — not a user-reported bug, but a failure that required understanding the underlying implementation to resolve? Third: does the team maintain any practice — code review, manual implementation exercises, architectural deep-dives — that builds understanding of the layers beneath the abstraction? Fourth: if the AI tools became unavailable tomorrow, could the team maintain and modify the existing codebase? Fifth: has the team tested the system specifically for the failure modes AI-generated code is most likely to exhibit?

The design logic of the successor test mirrors the original's: binary questions, administrable in minutes, producing a score whose interpretation is obvious. A team that answers 'no' to most questions is operating on borrowed competence — the competence is borrowed from the AI's reliability, and when the reliability fails, the team will not have the capability to respond. A team that answers 'yes' to most questions has preserved or rebuilt the diagnostic capability that the abstraction threatens to erode, at some cost in short-term productivity, as a deliberate insurance against the inevitable leak.

The test's purpose is not to rank teams or provide material for management consulting engagements. Its purpose is what Spolsky's original test's purpose was: to give practitioners a ready-made instrument for self-examination that makes visible what organizational incentive structures usually hide. The short-term metrics reward speed; the test rewards the practices that preserve the capability to survive failures. Teams that take the test seriously will find themselves in uncomfortable conversations about trade-offs that the productivity dashboard does not surface.

Origin

The original Joel Test was published as a blog post in August 2000. The AI-era five-question successor is developed in Chapter 10 of this volume, constructed by the Opus 4.6 simulation in the spirit of Spolsky's original design: a diagnostic instrument ordinary enough to be used, pointed enough to be useful.

Key Ideas

Binary questions force honest answers. The original test's refusal to allow partial credit was a feature; the AI-era successor preserves it.

The test measures practice, not outcome. Both versions ask what the team does, not what the team produces — because practice is what determines how the team will perform under conditions it has not yet encountered.

The questions make the invisible visible. Diagnostic capability does not appear on any dashboard; the test creates a place where its presence or absence can be seen.

The score is not the point. The value of administering the test is in the conversation the questions provoke, not in the number they produce.

The test is meant to be used. Like the original, it is designed for practitioners, not consultants — short, memorable, administrable without tooling.

In The You On AI Field Guide

Origin

Key Ideas

Related Entries

Further Reading