In September 2025, a follow-up to the Existential Risk Persuasion Tournament revealed that superforecasters had assigned an average 9.7-percent probability to the levels of AI capability that models actually achieved across four benchmarks (coding, graduate-level science, vision-and-language reasoning, agentic tool use). Domain experts performed better at 24.6 percent but were still off by a factor of four. The finding was humbling to both communities: the people who understood AI best and the people who were best at prediction had both systematically underestimated the pace of progress. The 9.7 percent became a symbol of irreducible uncertainty in the face of rapidly accelerating, genuinely novel phenomena — a paperweight sitting on every confident claim about AI timelines.
The underestimation was not random error but systematic. Both superforecasters and AI experts were using reference classes drawn from pre-2022 AI progress, when capabilities improved gradually and benchmarks were beaten incrementally over years. The 2022–2025 period broke those reference classes: scaling laws encountered a threshold, emergent capabilities appeared without being explicitly trained, and adoption curves steepened beyond any previous technology. The outside view that had served superforecasters well in geopolitical forecasting — anchor to base rates, adjust for case-specific features — produced radical underconfidence because the base rate had become obsolete. The inside view that AI experts deployed — attending to the technology's specific architecture, training methods, scaling potential — also undershot, because even domain experts did not anticipate how rapidly the capabilities would compound once past a threshold.
The finding forced both communities to recalibrate not just their AI predictions but their confidence in their own forecasting methods. Superforecasters, whose skill lay in identifying stable patterns, confronted a domain where the patterns were shifting faster than human updating cycles could track. AI experts, whose models were calibrated to pre-2022 progress rates, confronted a regime change their models had not predicted. The 9.7 percent became a shared epistemic scar — a reminder that even the best human judgment can be blindsided by phase transitions, and that the appropriate response to capability acceleration is not abandoning forecasting but widening uncertainty bands and increasing update frequency.
For Edo Segal, the 9.7 percent sits on his desk as a corrective. Every claim in The Orange Pill — the twenty-fold multiplier, the Death Cross timeline, the five-stage pattern — carries implicit confidence levels that the 9.7 percent interrogates. How sure are you? And how sure should you be? The number does not invalidate the claims but forces them into probabilistic form: not 'the twenty-fold multiplier is real' but 'I assign seventy-percent confidence that motivated teams sustain ten-fold improvements, thirty-percent that it generalizes industry-wide.' The shift from assertion to probability estimate is the shift from hedgehog to fox, and the 9.7 percent is the empirical cudgel that forces the shift.
The September 2025 follow-up was conducted by the Forecasting Research Institute in collaboration with Tetlock's group at Penn. Forecasters who had participated in the Existential Risk Persuasion Tournament were asked to retrospectively estimate the probabilities they would have assigned, eighteen months earlier, to specific AI capability levels. The estimates were compared to what the forecasters had actually predicted about AI progress timelines in 2023–2024. The gap was enormous and consistent: everyone had underestimated. The result was published as a working paper and presented at forecasting conferences as a case study in how even gold-standard methodology can fail when the domain undergoes a phase transition. The 9.7 percent became shorthand for the limits of prediction in the face of genuine novelty.
Universal underestimation. Superforecasters (9.7%) and AI experts (24.6%) both radically undershot — neither cognitive style nor domain knowledge provided protection against the surprise.
Reference class obsolescence. Base rates drawn from pre-2022 AI progress became misleading after the scaling-law threshold, rendering the outside view temporarily uninformative.
Phase transition blindness. Gradual changes are forecastable; discontinuous regime shifts are not — and distinguishing which kind of change is occurring is itself a forecast subject to error.
Confidence calibration corrective. The 9.7 percent forces wider uncertainty bands on all AI timeline predictions — not abandoning forecasting but acknowledging that variance is larger than any model captured.
Humility as empirical finding. Even the world's best forecasters, using the world's best methods, can be blindsided — the appropriate response is not despair but probabilistic humility and increased update frequency.