Peer review, in Polanyi's account, is not the application of a checklist but the exercise of communal connoisseurship. Reviewers assess whether research is methodologically sound, logically valid, and evidentially supported—these are necessary but not sufficient judgments. The crucial evaluation concerns significance: does this work advance understanding in a way that matters to the field? This judgment is irreducibly tacit. It requires the reviewer's deep immersion in the domain, her sense of what the field knows and what it needs to know, her intuition about which directions are promising and which are exhausted. Two competent reviewers may disagree about significance without either being wrong, because each draws on a tacit ground shaped by different research trajectories. What makes peer review epistemologically reliable is not that it eliminates disagreement but that it ensures evaluation by practitioners who possess the tacit knowledge to recognize genuine advances. AI cannot perform this function—it can check explicit criteria but cannot exercise the connoisseurial judgment that separates significant contributions from competent incremental work.
Polanyi's analysis of peer review challenged the positivist picture of science as methodologically self-regulating. If science were truly a matter of following explicit procedures, peer review would be mechanical: check whether the researcher followed the steps correctly, and the validity of the results would follow automatically. But scientists do not experience peer review this way. Reviewers exercise judgment—about whether the experimental design was adequate to the question, whether the data interpretation is reasonable, whether the conclusions are warranted. These judgments involve evaluation against tacit standards that the field maintains collectively but cannot fully articulate. The field knows what good work looks like, but the knowing is distributed across practitioners and operates through their connoisseurial evaluation rather than through explicit rules.
The AI research community's benchmark problem is peer review failure transposed into computational terms. Models are evaluated against standardized benchmarks—question-answering accuracy, reasoning test performance, code generation success rates. But these explicit metrics cannot capture the tacit dimension of what makes a model genuinely better. When GPT-4 outperformed GPT-3.5 on benchmarks, the improvement was real. But whether the improvement represented deeper understanding or more sophisticated pattern-matching—whether the model had moved closer to genuine intelligence or merely gotten better at statistical mimicry—required judgment that benchmarks cannot provide. The field's most experienced researchers possess this judgment. They can sense, through years of working with models, when a capability is robust and when it is brittle. The judgment is connoisseurial, operating through tacit comparison of the new model's behavior against an internalized sense of what genuine understanding would look like.
Organizational decision-making about AI adoption often lacks the peer-review equivalent—the community of practitioners with sufficient tacit knowledge to evaluate whether a proposed AI deployment will actually work under real-world conditions. The consultant who recommends an AI system may understand the technology's explicit capabilities but lack the tacit knowledge of the organization's actual practices—the informal workarounds, the unwritten knowledge, the contextual sensitivities that make the difference between a tool that enhances work and a tool that disrupts it. The organization, lacking internal expertise to evaluate the recommendation against tacit understanding of its own operations, accepts the recommendation because it meets explicit criteria (impressive demos, promising benchmarks, compelling ROI projections). The absence of convivial peer review—evaluation by practitioners who possess both technological understanding and organizational tacit knowledge—is why so many enterprise AI deployments produce results far below their promised potential.
Polanyi developed his account of peer review primarily in "The Republic of Science" (1962), arguing that the scientific community is not organized hierarchically but operates as a self-coordinating network in which authority is distributed. Each scientist has authority in her local domain of expertise and defers to colleagues' authority in theirs. Peer review is the mechanism through which this distributed authority operates—through which the community's collective tacit knowledge is brought to bear on individual contributions.
Evaluates significance, not just validity. Peer review's crucial function is assessing whether work matters to the field—a judgment requiring tacit understanding of the field's frontiers that no explicit criteria capture.
Connoisseurial, not mechanical. Reviewers exercise personal judgment grounded in years of domain immersion—evaluating against tacit standards that resist formalization into checklists.
Distributed tacit knowledge. The scientific community's collective understanding exceeds any individual's—peer review pools connoisseurial judgments from multiple practitioners with different tacit grounds.
Disagreement can be legitimate. Two reviewers may reach opposite assessments of the same work, both from genuine expertise, because their tacit evaluative frameworks differ—irreducible plurality is a feature, not a bug.
AI cannot replace function. Automated evaluation can check explicit compliance but cannot exercise the tacit connoisseurship that distinguishes significant advances from incremental competence—requires human judgment grounded in embodied practice.