Existential Risk Persuasion Tournament — Orange Pill Wiki
EVENT

Existential Risk Persuasion Tournament

Tetlock's application of adversarial collaboration to AI existential risk — pairing domain experts with superforecasters to test whether structured debate changes probability estimates of catastrophe.

The Existential Risk Persuasion Tournament, organized by Tetlock and collaborators, brought together AI domain experts and superforecasters in structured adversarial dialogues about the probability of AI-caused catastrophe or extinction by 2100. AI experts assigned a median twelve-percent probability of catastrophe and 3.9-percent probability of extinction; superforecasters assigned 2.31-percent catastrophe and 0.38-percent extinction. Neither group was able to persuade the other to substantially revise their long-term estimates, suggesting disagreement was not primarily about evidence but about the weighting of inside view (AI's unique features) versus outside view (base rates for technological catastrophe). A September 2025 follow-up revealed both groups had radically underestimated near-term AI progress: superforecasters assigned only 9.7-percent probability to benchmark achievements that actually occurred.

In the AI Story

Hedcut illustration for Existential Risk Persuasion Tournament
Existential Risk Persuasion Tournament

The tournament structure was designed to test a specific hypothesis: that adversarial collaboration — bringing together people with opposing views in a framework requiring them to engage each other's strongest arguments — would produce belief convergence toward a more accurate aggregate estimate. The methodology had proven effective in other domains where experts disagreed. In the AI case, convergence did not occur. The gap between AI experts' and superforecasters' estimates narrowed only slightly. This persistence suggested that the disagreement was not a communication failure but a genuine difference in how the two groups weighted incommensurable considerations. The AI experts saw specific features of the technology — recursive self-improvement, goal misalignment, deployment speed — that made this technology categorically different from historical precedents. The superforecasters saw base rates for technological catastrophe, which are low, and adjusted upward only modestly for AI's specific features.

The September 2025 humbling — when both groups' near-term predictions were revealed to have been wildly miscalibrated — provided the most important data point. Across four AI benchmarks (coding, graduate-level science, vision-and-language reasoning, agentic tool use), superforecasters had assigned an average 9.7-percent probability to the levels of capability models actually achieved. Domain experts assigned 24.6 percent — better, but still off by a factor of four. The underestimation was not random error but systematic: everyone was using reference classes from pre-2022 AI progress, and those reference classes became obsolete when scaling laws encountered the threshold that large language models crossed in late 2022. The tournament demonstrated both the value of structured forecasting methodology and the limits of even the best human prediction when facing genuinely novel, rapidly accelerating phenomena.

The tournament's long-term estimates remain unresolved — the question of AI extinction risk by 2100 will not be scoreable for seventy-five years. But the tournament structure itself — the pairing of domain expertise with forecasting methodology, the requirement for probabilistic specificity, the documentation of reasoning — represents a model for how societies might approach high-stakes decisions about AI. Not by deferring to domain experts whose overconfidence Tetlock documented, not by deferring to superforecasters whose underestimation of AI progress the 2025 data revealed, but by combining both approaches in a framework that makes disagreements explicit, reasoning transparent, and updating obligatory as evidence accumulates.

Origin

The tournament emerged from Tetlock's recognition that AI presented the highest-stakes forecasting question of the century, and that the discourse was dominated by confident assertions rather than calibrated probabilities. In collaboration with the Forecasting Research Institute and Long-Term Future Fund, Tetlock designed a structure that would force both AI optimists and AI pessimists to specify their beliefs numerically, engage each other's arguments, and update based on adversarial exchange. The tournament began in 2023 and continues, with periodic scoring of near-term benchmarks providing the feedback that long-term existential risk estimates cannot yet receive. The design was explicitly modeled on the Good Judgment Project's success, adapted to a domain where the stakes were civilizational rather than geopolitical.

Key Ideas

Inside view versus outside view. AI domain experts weight the technology's unique features heavily; superforecasters weight historical base rates heavily — both approaches capture partial truth.

Adversarial collaboration limits. Structured debate between opposing views does not guarantee convergence when disagreement reflects incommensurable values or weighting of considerations rather than factual errors.

Systematic underestimation of near-term progress. Even superforecasters and AI experts radically miscalibrated the pace of 2024–2025 capability gains, suggesting reference class obsolescence.

Long-term uncertainty irreducibility. Existential risk probabilities by 2100 reflect not merely epistemic uncertainty but fundamental unpredictability of recursive technological development.

Methodology over conclusions. The tournament's value lies not in the specific probability estimates it produced but in the infrastructure of transparent reasoning, adversarial challenge, and obligatory updating it modeled.

Appears in the Orange Pill Cycle

Further reading

  1. Karger, E., et al. (2023). 'Comparing Forecasts of AI Doom.' Working paper, Forecasting Research Institute.
  2. Schuett, J., Dreksler, N., et al. (2023). 'Towards Best Practices in AGI Safety.' arXiv preprint.
  3. Tetlock, P.E., & Gardner, D. (2015). Superforecasting, Chapter 11: 'Are They Really So Super?'
Part of The Orange Pill Wiki · A reference companion to the Orange Pill Cycle.
0%
EVENT