Launched in 2011 under Intelligence Advanced Research Projects Activity (IARPA) funding, the Good Judgment Project recruited thousands of volunteers to make probabilistic forecasts on geopolitical events. Tetlock's team competed against four other research groups, and won decisively — so decisively that IARPA ended the tournament two years ahead of schedule. The mechanism of victory was replicable: a one-hour training module in probabilistic reasoning, combined with a platform enabling forecasters to update estimates as evidence accumulated. Superforecasters — the top two percent of performers — beat intelligence analysts with access to classified information by roughly thirty percent, demonstrating that cognitive habits matter more than credentials or secret intelligence.
The tournament asked participants to forecast events whose outcomes would be known within months or years: Would Greece exit the Eurozone? Would North Korea conduct another nuclear test? Would violence in Syria escalate past a specific threshold? Each question required a probability estimate that could be continuously updated as news arrived. Participants received immediate, unambiguous feedback when events resolved, creating the tight feedback loop that calibration requires. The platform tracked every forecast, every update, and scored each against the actual outcome using Brier scores. Forecasters could see their calibration curves — whether their seventy-percent predictions were actually right seventy percent of the time — and adjust accordingly.
Tetlock's team discovered that performance could be dramatically improved through simple interventions. Teams of forecasters sharing information and challenging each other's reasoning outperformed individuals working alone. Providing forecasters with training in cognitive debiasing techniques — considering alternative hypotheses, taking the outside view, avoiding anchoring on initial estimates — produced measurable gains. Most strikingly, the training effects persisted: forecasters who received the one-hour module maintained their advantage across multiple years, suggesting that the habits had been genuinely installed rather than temporarily activated. The superforecasters were not selected for any pre-existing trait. They emerged through practice.
The project's AI implications were not apparent in 2011 but became central by the 2020s. The cognitive habits superforecasters cultivated — granular probabilistic thinking, continuous belief updating, synthesis across information sources — turned out to be precisely the habits most threatened by AI tools that produce confident answers regardless of underlying certainty. The project demonstrated that these habits could be trained and maintained, providing the empirical foundation for the claim that professionals using AI can preserve calibrated judgment if they practice the discipline deliberately. The project also demonstrated, through its scoring infrastructure, what accountability for judgment actually requires: specific predictions, public commitments, scored outcomes, and the willingness to learn from error.
IARPA, the intelligence community's research arm, created the forecasting tournament to address a practical problem: intelligence analysts made predictions constantly, but prediction quality was never rigorously evaluated, and there was no systematic method for improving it. The agency funded five competing research teams to develop and test forecasting methodologies over four years. Tetlock, whose Expert Political Judgment had established his reputation as the leading scholar of prediction, assembled a team and designed a platform that would become Good Judgment Open — a public-facing forecasting site that outlasted the IARPA contract and continues operating today. The tournament ran from 2011 to 2015, producing millions of data points and establishing the empirical foundation for the claim that judgment can be measured, trained, and improved at scale.
Superforecasters beat classified intelligence. Ordinary citizens trained in probabilistic reasoning outperformed professional analysts with security clearances — demonstrating that method matters more than access.
Team forecasting advantage. Small groups of forecasters sharing information and challenging each other's reasoning outperformed even the best individuals working alone.
Continuous updating superiority. Forecasters who revised estimates frequently as evidence accumulated outperformed those who made initial predictions and stuck with them.
Training produces durable gains. A one-hour module on probabilistic reasoning and cognitive debiasing created advantages that persisted across multiple years of tournament participation.
Feedback loop necessity. Calibration improved only when forecasters received clear, timely, unambiguous feedback on whether their predictions were correct — the infrastructure for accountability was the mechanism of improvement.