The critique applies with specific force to sprint velocity tracking. Sprint velocity measures how many story points a team completes in a sprint. Teams that increase their velocity are celebrated as improving; teams whose velocity declines are examined for dysfunction. The measurement assumes that completed story points correspond to system value — that more points completed means more value delivered. This assumption holds only if the features represented by the points have been evaluated, validated, and found worth building. In AI-augmented teams, where generation capacity vastly exceeds evaluation capacity, the assumption breaks. Velocity can increase while system throughput — the rate at which genuine value reaches users — stagnates or declines.
The specific mechanism is the one Goldratt diagnosed repeatedly in manufacturing: local optimization of a non-constraint produces inventory, not throughput. AI-augmented engineers can generate features faster than product managers can evaluate them, QA can test them, and users can absorb them. The features ship — velocity increases — but the downstream capacity to validate their value has not scaled. The system accumulates features of uncertain value, creating cognitive inventory and product incoherence that will eventually manifest as maintenance burden, user confusion, and competitive vulnerability.
The critique extends to deployment frequency, pull request counts, code review throughput, and virtually every quantitative metric of engineering activity. Each measures the rate of a non-constraint and implicitly assumes the constraint is elsewhere. The measurement frameworks were designed for an era when they were approximately correct. In the AI era, they are systematically misleading. Organizations celebrating them are celebrating what Goldratt would immediately recognize as the wrong thing.
The alternative the simulation proposes is explicit measurement of the judgment constraint: decision quality, evaluation depth, system coherence, product-user fit. These metrics are harder to measure precisely, which is why organizations default to the easier proxies. But Goldratt's framework is clear: measure the wrong thing and you will optimize for the wrong thing. The difficulty of measuring judgment does not justify measuring velocity; it justifies the harder work of building measurement systems adequate to the actual constraint. An organization that measures judgment imperfectly is in better shape than an organization that measures velocity precisely — because the first is aiming at the right target, however badly, while the second is aiming at the wrong target, however accurately.
The critique synthesizes Goldratt's long-standing attack on local-optimization metrics with the specific constraint migration produced by AI. It draws on the Berkeley study's documentation of task seepage and the broader empirical record of AI-augmented work intensification.
Velocity measures the non-constraint. Story points, sprint velocity, and feature counts measure engineering output, which is no longer the system's binding constraint.
Local optimization of velocity produces inventory. Features generated faster than they can be evaluated accumulate as cognitive and product inventory — liabilities masquerading as assets.
The measurement framework was built for a prior era. Agile metrics were approximately right when coordination was the constraint; they are systematically wrong now that judgment is.
Judgment metrics are harder but necessary. Decision quality, evaluation depth, and product coherence are difficult to measure, but measuring them badly is superior to measuring velocity precisely.
Organizational culture resists the critique. Velocity metrics are embedded in performance reviews, compensation, and professional identity — making their replacement a political project as much as a technical one.