Philip E. Tetlock is the Annenberg University Professor at the University of Pennsylvania, holding appointments at both Wharton and the School of Arts and Sciences. Born in Canada, he received his PhD from Yale and spent formative years at UC Berkeley before moving to Penn. His landmark Expert Political Judgment (2005) documented that political and economic experts' predictions were no more accurate than random guessing, while a small minority of 'superforecasters' consistently outperformed both experts and statistical models. The distinguishing feature was not intelligence but cognitive style: foxes who knew many things outperformed hedgehogs who knew one big thing.
Tetlock's twenty-year longitudinal study (1984–2004) collected 28,000 predictions from 284 credentialed experts across political science, economics, and intelligence analysis. Each prediction was specific, time-bound, and probabilistic — designed to be scored with the precision of a ledger. The results were devastating to the cult of expertise: average expert accuracy approximated that of a dart-throwing chimpanzee. More damaging still, the experts who appeared most frequently on television and wrote the most assured op-eds were the worst forecasters. Confidence and accuracy were inversely correlated. The finding should have reshaped how societies make decisions; it did not, because media ecosystems reward hedgehog confidence over fox calibration.
The Good Judgment Project (2011–2015), funded by IARPA, demonstrated that forecasting could be dramatically improved. Tetlock's team won the forecasting tournament by such a margin that IARPA shut it down two years early. The mechanism of victory was replicable: structured training in probabilistic reasoning, granular probability estimates, frequent updating, and active search for disconfirming evidence. 'Superforecasters' maintained their advantage across multiple years by practicing these habits continuously. The project established forecasting as a trainable skill rather than an innate gift, provided the empirical foundation for Superforecasting (2015), and created the methodology now used by intelligence agencies and corporations worldwide.
Tetlock's trajectory on AI illuminates the fox's method. In 2015, he expressed difficulty imagining AI doing what superforecasters collectively do 'in the near term.' By 2018, he was designing human-machine forecasting hybrids. By 2024, his research showed LLM ensemble predictions rivaling human crowd accuracy. By 2025, he told Newsweek it was 'absolutely crucial to integrate LLMs into almost all lines of inquiry' and predicted that within three years, unassisted human forecasting would cease to make sense in serious policy debates. This was not a man changing his mind because winds shifted — it was proportional updating as evidence accumulated, the cognitive discipline he spent forty years studying.
His recent work on AI-assisted forecasting includes the Existential Risk Persuasion Tournament (organizing adversarial collaborations between AI domain experts and superforecasters), the 'Wisdom of the Silicon Crowd' research demonstrating LLM ensemble accuracy, and the Hybrid Forecasting Competition pitting humans against machines against human-machine hybrids. The research reveals both AI's impressive predictive capabilities and its systematic failures: everyone, superforecasters and domain experts alike, radically underestimated the pace of AI progress in 2025, assigning single-digit probabilities to benchmark achievements that actually occurred within months.
Tetlock's intellectual formation was shaped by his graduate training at Yale in the 1970s, where he encountered the heuristics-and-biases research of Kahneman and Tversky, and by his early faculty years at Berkeley during the 1980s, where he absorbed the political psychology tradition. The 1984 decision to begin the expert prediction study emerged from a simple observation: experts made predictions constantly — on television, in journals, in classified briefings — but those predictions were almost never scored against outcomes. The infrastructure for accountability did not exist. Tetlock built it, collecting predictions with the patience of a naturalist documenting a species and the rigor of a laboratory scientist designing an experiment.
His fox-hedgehog distinction was borrowed from Isaiah Berlin's 1953 essay 'The Hedgehog and the Fox,' which classified writers and thinkers according to whether they related everything to a single organizing principle (hedgehogs) or pursued many ends unconnected by any conscious system (foxes). Tetlock transformed Berlin's literary insight into an empirical finding by demonstrating that the cognitive style predicted forecasting accuracy across domains, time horizons, and levels of expertise. The hedgehog's confidence — the feeling of mastery produced by a powerful explanatory framework — was the mechanism of forecasting failure. The fox's discomfort — the holding of multiple frameworks in tension — was the mechanism of forecasting success.
Expert accuracy paradox. Credentialed experts with deep domain knowledge predict future events no better than random chance — and the most confident experts are the least accurate.
Fox-hedgehog cognitive styles. Foxes who know many things and hold multiple frameworks simultaneously outperform hedgehogs who know one big thing and apply it universally, because world complexity exceeds single-framework reach.
Calibration as trainable skill. The correspondence between stated confidence and actual accuracy improves through structured practice: granular probabilities, frequent updating, disconfirmation search, and feedback loops.
Superforecasting methodology. An hour of training in probabilistic reasoning produces measurable, durable improvement in prediction accuracy — demonstrating that better judgment is a skill, not a gift.
Identity-protective cognition. People process information in ways that protect membership in valued social groups — the mechanism by which expertise becomes a liability when predictions carry reputational stakes.
The primary debate surrounding Tetlock's work concerns the generalizability of forecasting skill across domains. Critics argue that geopolitical forecasting — the domain of his original research — is sufficiently different from technological, scientific, or business forecasting that the fox-hedgehog distinction may not transfer. Defenders counter that the cognitive habits of calibration are domain-general, applying wherever prediction under uncertainty is required. A second debate concerns AI's role: whether LLMs represent a qualitative break from previous forecasting technologies or merely an incremental improvement in the aggregation of human judgment.