PERSON

Philip Tetlock

The psychologist who proved expert prediction is often no better than chance—and then proved that it can be dramatically improved with the right cognitive habits, transforming forecasting from a gift into a trainable skill.

For twenty years Philip Tetlock ran the most rigorous study of expert prediction ever conducted, and the headline finding was devastating: the average credentialed expert, asked to make specific, time-bound, probabilistic forecasts in their own domain, performed no better than a dart-throwing chimpanzee. The punchline became famous. The second finding—that a small minority of forecasters consistently outperformed both the average expert and sophisticated statistical models—became the more important one. What distinguished these superforecasters was not intelligence or knowledge but cognitive style: the fox who knows many things, holds multiple frameworks simultaneously, treats its own confidence as a variable to calibrate rather than a virtue to defend. The hedgehog, certain and narrative-driven, consistently underperformed. In the AI age, Tetlock's lens becomes essential: the systems that now produce confident, fluent output regardless of accuracy have created an environment in which calibrated uncertainty is simultaneously more necessary and more threatened than at any previous moment in the history of human judgment. His own trajectory on AI—cautious in 2015, attentive to evidence by 2018, declaring a paradigm shift by 2024—is the fox's method enacted, and a living demonstration that the discipline he studied can be practiced even by the man who studied it.

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI asks what it would mean to see the machine clearly, without the narcotic of hype or the paralysis of fear. Tetlock is the cycle's epistemologist: the person who can tell you not just what to think about AI but how to think about it, and how to tell whether you are thinking well. His framework reveals that the AI discourse has been dominated by hedgehogs—triumphalists who know one big thing (AI is progress) and catastrophists who know one big thing (AI is threat)—while the foxes, who hold both signals simultaneously and update as evidence arrives, inhabit the silent middle that the algorithm punishes most severely.

The cycle reads Tetlock's research as structural diagnosis of the discourse itself. The media ecosystem, the social platform, the professional community: all select for hedgehog confidence over fox calibration, because confident claims generate engagement and qualified claims do not. The person who says “there is a sixty-three percent probability of a moderate recession” does not get invited back on television. The system distributes narratives that resolve the tension, and the tension is where the truth lives. Tetlock’s contribution is to have demonstrated, with two decades of data, that the cognitive style the system punishes is also the cognitive style most likely to be accurate.

His research also illuminates the specific hazard that the cycle calls calibration failure: the degradation of the questioning capacity when AI output arrives polished and assured, regardless of whether the underlying reasoning holds. The superforecaster’s core discipline—asking ‘how confident am I, and how confident should I be?’—is exactly the habit that fluent, confident machine output makes least likely to occur. The questioning muscle, in Tetlock's framework, is trainable and loseable, and the AI environment is an environment that loses it. What [YOU] on AI calls ascending friction—the genuine difficulties that remain when lower-level obstacles have been removed—is, in Tetlock's terms, the resistance training that keeps the capacity alive.

Origin

Philip Tetlock was born in 1954 in Canada and trained as a psychologist at Yale. His early work on political judgment and accountability established the pattern that would define his career: the systematic, empirical study of how experts reason, under what conditions they do it well, and what cognitive habits separate the accurate from the merely confident. The twenty-year Expert Political Judgment study, which enrolled 284 experts and collected 28,000 predictions beginning in 1984, was an act of almost perverse methodological rigor in a field that had always evaluated expertise by credentials and fluency rather than by scored track records. The results, published in 2005, produced the dart-throwing chimpanzee finding—and the quieter, more consequential finding about the fox-hedgehog distinction that organizing the data around Isaiah Berlin's taxonomy revealed.

The Good Judgment Project, which Tetlock launched in 2011 under the auspices of IARPA's Aggregative Contingent Estimation program, was both a vindication and an extension of the earlier work. It pitted teams against each other in a multi-year geopolitical forecasting tournament, and Tetlock's team won by such a margin that IARPA shut the tournament down two years early. The Good Judgment Project demonstrated that ordinary citizens, trained in the cognitive habits of the superforecaster, could outperform intelligence analysts with access to classified information. The finding was both a critique of institutional expertise and a proof of concept: calibration is trainable, it persists, and the training requires less than experts expect—an hour of structured instruction in probabilistic reasoning produced measurable improvement.

Tetlock’s subsequent career has applied the same methodology to increasingly high-stakes domains. The Existential Risk Persuasion Tournament organized adversarial collaborations between AI domain experts and superforecasters on questions of catastrophic AI risk. The Hybrid Forecasting Competition tested human-machine hybrids against pure-human and pure-machine baselines. His 2015 book with Dan Gardner brought the superforecaster framework to a wide audience and remains the most readable account of how calibrated judgment works in practice. Throughout, Tetlock has done with his own positions what he studied in others: updated them proportionally as evidence arrived, without retrospective self-justification.

Key Ideas

The Fox and the Hedgehog. Borrowing Berlin's taxonomy, Tetlock identified two cognitive styles that predict forecasting accuracy. The hedgehog knows one big thing—a grand theory that organizes all evidence and resists updating precisely because the narrative is satisfying. The fox knows many things, holds multiple frameworks simultaneously, and treats confidence as a variable to calibrate rather than an identity to defend. Twenty years of scored predictions demonstrated that hedgehogs are not merely less accurate than foxes—they are less accurate than simple statistical baselines, worse than chimpanzees in the aggregate. The distinction is the cycle’s primary lens for diagnosing the AI discourse.

Calibration as trainable skill. The Good Judgment Project demonstrated that superforecasters are not born but made: a one-hour training module in probabilistic reasoning produced lasting improvements in forecast accuracy. The key habits are granular probability assignments, frequent updating, active search for disconfirming evidence, and resistance to identity-protective cognition. Calibration improves with practice against consequential feedback, and it degrades without that practice—a finding with direct implications for AI-augmented professionals whose feedback loops are attenuated.

Overconfidence and the Calibration Problem

Inside view and outside view. Superforecasters integrate two perspectives: the inside view (the specific features of this situation that make it unique) and the outside view (the base rate for outcomes in the reference class of similar situations). Hedgehogs rely almost exclusively on the inside view—their grand theory tells them why this case is different. Foxes begin with the base rate, then adjust for the case-specific features, weighting each adjustment by the quality of the evidence behind it. Tetlock’s Existential Risk Persuasion Tournament revealed that AI domain experts and superforecasters disagreed primarily on which reference class to use, not on the evidence itself.

The 9.7 percent problem. A September 2025 follow-up to the existential risk tournament revealed that everyone—superforecasters and domain experts alike—had dramatically underestimated the pace of AI progress. Superforecasters had assigned just a 9.7 percent probability to the benchmark achievements that had actually occurred by 2025. The finding is the fox’s response to a humbling: not to discredit forecasting, but to recalibrate the priors and recognize that the pace of AI development has exceeded the reference classes both groups were using, requiring new ones.

Human-AI symbiosis and its risks. Tetlock’s “Wisdom of the Silicon Crowd” research demonstrated that LLM ensemble predictions could rival human crowd accuracy, and that a human-machine hybrid outperformed both. But the symbiosis depends on the human component maintaining calibrated judgment—the very capacity that the AI component, through its confident fluency, threatens to degrade. The circular dependence is the deepest structural problem: the human who relies on AI confirmation to validate their assessments is consulting an echo of their own training data, not an independent second opinion.

Debates & Critiques

The central debate is whether superforecasting generalizes to the AI domain, where the pace of change has repeatedly exceeded the reference classes that forecasting depends on. The Existential Risk Persuasion Tournament found that AI domain experts and superforecasters could not persuade each other to change their long-term estimates, and the September 2025 follow-up humbled both by demonstrating that all had dramatically underestimated AI progress. This has prompted a methodological debate: whether the outside view, which grounds predictions in historical base rates, is systematically misleading in domains undergoing nonlinear change, and whether a fox who cannot find an adequate reference class should hold more uncertainty than any specific probability estimate can represent. Tetlock’s own position—that LLMs will “revolutionize human-based forecasting” within three years—can itself be subjected to the superforecaster’s scrutiny: the base rate for “revolutionary within three years” claims is low, the inside view is strong, and a calibrated assessment might assign fifty to seventy percent to significant integration while assigning only twenty to thirty percent to the stronger claim of human forecasting obsolescence. Isaiah Berlin, whose fox-hedgehog distinction Tetlock borrowed, would have noted that even the fox’s many frameworks can fail to capture what is genuinely unprecedented.

The Superforecaster's Triad

Three cognitive habits that separate the calibrated from the confident

Habit One

Granular Probability

Assign numbers, not words. “Likely” means different things to different people; “67 percent” means 67 percent. The discipline of quantification forces honesty about internal uncertainty that verbal estimates permit you to hide—from others and from yourself.

Habit Two

Active Disconfirmation

Seek the case against your position with the same energy you bring to the case for it. Disconfirming evidence eliminates hypotheses; confirming evidence merely fails to. The asymmetry is mathematically correct and cognitively unnatural—which is exactly why it must be practiced.

Habit Three

Proportional Updating

Change your mind when the evidence changes—not reluctantly, as a concession, but naturally, as a cognitive habit. The superforecaster who revised his AI estimates as LLM capabilities mounted was not being inconsistent. He was being the fox he had always been.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Debates & Critiques

The Superforecaster's Triad

Related Entries

Further Reading