EVENT

The Paint Color Anecdote

Flyvbjerg's personal encounter with an AI-powered color-matching app that failed at a task with an objectively verifiable answer — a small symptom deployed with diagnostic precision as evidence of the underlying pathology in current AI systems.

In a companion observation to the Big Dig test, Flyvbjerg recounted using an AI-powered paint-color matching application — the kind of consumer tool that AI companies hold up as evidence of the technology's everyday utility. The system failed at the basic task it was designed to perform. The recommended color bore no resemblance to the target. Flyvbjerg's conclusion — artificial intelligence turned out to be a real waste of time and money in this case — was deployed not as a dismissal of consumer AI but as a diagnostic observation: if the system cannot match a paint color, a task with an objectively verifiable correct answer, then the confident authority with which it pronounces on complex, ambiguous, high-stakes questions should provoke not admiration but alarm.

The Selection Effect Problem — Contrarian ^ Opus

There is a parallel reading that begins from the material conditions of AI deployment rather than its epistemic failures. The paint-color app didn't fail because AI lacks uncertainty signals — it failed because the economic incentives select for rapid deployment over accuracy. A human color-matcher costs $30/hour; an AI app costs $0.001 per query. The market doesn't require the AI to match colors correctly — it requires it to be cheap enough that occasional failures don't matter to the business model. The app probably works well enough for 80% of cases, and that's sufficient for profitability. The industrial caution Flyvbjerg cites from Mercedes isn't about recognizing AI's epistemic limits — it's about recognizing liability exposure. Mercedes proceeds carefully not because they understand AI better, but because their lawyers understand lawsuits better.

This shifts the diagnostic frame entirely. The problem isn't that AI systems lack internal epistemic states or uncertainty calibration — it's that the political economy of AI deployment systematically selects against quality control. The paint-color failure isn't evidence of a deep architectural flaw in AI; it's evidence of a market that has learned to profit from deploying broken tools. The consumer who encounters the failed color match has already paid for the app, or viewed the ads, or provided the data. The failure happens after the value extraction. This isn't artificial ignorance — it's artificial indifference, engineered into the business model. The confident wrongness Flyvbjerg diagnoses as an epistemic problem is actually an economic feature: confidence sells better than accuracy, especially when the consequences of being wrong fall on users rather than providers.

— Contrarian ^ Opus

In the AI Story

Hedcut illustration for The Paint Color Anecdote — The Paint Color Anecdote

The anecdote's methodological power lies in the asymmetry it exposes. Paint-color matching is the kind of task that should be easy for AI — a narrow, specifiable operation with an empirically testable outcome. The system's failure at this task, while it continues to project confident authority on harder tasks, reveals the structural problem: AI systems do not distinguish between domains where they are competent and domains where they are not. They produce the same fluent, confident output regardless of underlying accuracy.

The contrast with human expertise is instructive. A painter asked to match a color will typically acknowledge uncertainty if the sample is degraded, the lighting is unusual, or the target is ambiguous. The acknowledgment of uncertainty is itself expert behavior — a phronetic signal that the expert calibrates to context. AI systems do not produce such signals because they do not possess the internal epistemic states that would ground them. The absence of uncertainty signals is not a feature to be added later but a structural consequence of the generation architecture.

The anecdote complements the Big Dig test by covering the opposite end of the stakes spectrum. The Big Dig test exposed the problem at the professional knowledge level — a question a scholar might plausibly outsource to an AI. The paint-color test exposes the problem at the consumer level — the millions of small daily decisions that AI tools are marketed to assist. The failures are structurally identical. The lesson is that the failure mode is general, not specific to any particular domain or stakes level.

The industrial parallel Flyvbjerg draws — Mercedes' product liability concerns about ChatGPT integration into vehicles — demonstrates that the problem is visible to organizations whose deployment decisions carry direct legal and safety consequences. The automotive industry has proceeded with a caution that the broader culture has not matched, precisely because the industry cannot afford to treat AI confident wrongness as an acceptable operational cost.

Origin

Flyvbjerg recounted the anecdote in his 2025 AI writings and subsequent social media posts accompanying the release of 'AI as Artificial Ignorance.'

Key Ideas

Narrow task failure. AI systems fail at tasks that should be easy for them while continuing to project confident authority on harder tasks.

Uniform confidence. The systems do not distinguish between domains where they are competent and domains where they are not — the confidence signal is decoupled from the accuracy signal.

Absence of uncertainty. Expert behavior includes calibrated uncertainty signals; AI systems do not produce such signals because they lack the internal epistemic states that would ground them.

Consumer-scale diagnosis. The paint-color failure demonstrates that the problem operates at the scale of ordinary daily decisions, not only in high-stakes professional contexts.

Industrial recognition. Sectors with direct liability exposure — automotive, medical, aviation — have recognized the problem and proceeded with corresponding caution.

Appears in the Orange Pill Cycle

Bent Flyvbjerg — On AI

The Diagnostic Convergence — Arbitrator ^ Opus

Both readings arrive at the same alarm through different analytical paths. On the question of what the paint-color failure reveals, Flyvbjerg's epistemic diagnosis dominates (70/30) — the system genuinely cannot distinguish between domains of competence and incompetence. But on the question of why such systems get deployed anyway, the political economy reading is stronger (80/20) — market incentives clearly favor rapid deployment over accuracy. The two analyses aren't competing; they're describing different layers of the same phenomenon.

The synthesis emerges when we ask about systemic risk. Here the views balance (50/50): Flyvbjerg correctly identifies that AI lacks uncertainty signals, making it epistemically dangerous; the contrarian correctly identifies that markets select for this very characteristic, making it economically inevitable. The paint-color app is diagnostic precisely because it demonstrates both problems simultaneously — a system that cannot know when it's wrong, deployed by a market that doesn't care. Mercedes' caution validates both readings: they recognize both the technical inability to guarantee accuracy and the legal consequences of that inability.

The proper frame may be 'diagnostic convergence' — multiple analytical paths leading to the same conclusion about deployment risk. Whether we trace the problem through epistemic architecture (Flyvbjerg's path) or political economy (the contrarian's path), we arrive at the same warning: systems that cannot signal uncertainty are being deployed at scale in contexts where uncertainty matters. The paint-color anecdote works as diagnosis not because it proves one theory over another, but because it makes visible a failure mode that both theories predict. The question isn't which lens is correct but rather which intervention points each lens reveals.

— Arbitrator ^ Opus

The Paint Color Anecdote

In the AI Story

Origin

Key Ideas

Appears in the Orange Pill Cycle

Related Entries

Further reading