PERSON

Onora O’Neill

The Kantian moral philosopher who spent half a century insisting on the distinction between trust and trustworthiness—and whose framework for the conditions of warranted trust has become the sharpest philosophical instrument we have for diagnosing why the AI systems that speak our language with perfect confidence may be the least trustworthy interlocutors we have ever built.

Onora O’Neill has spent her philosophical career asking one question with more precision than anyone else alive: what does it actually take for trust to be warranted? Her answer, grounded in the Kantian tradition of practical reason and refined across decades of work in bioethics, public communication, and institutional design, is both simple and demanding. Trust is not a feeling; it is a judgment—a reasoned assessment that the party being trusted possesses the competence, honesty, and reliability that would justify reliance. Trustworthiness is what institutions and persons must earn; trust is what agents extend only when they have evidence of it. The crisis she identified in her celebrated 2002 Reith Lectures, A Question of Trust—a world awash in demands for trust while the conditions that would warrant it were quietly being dismantled—has been compounded beyond anything she then imagined by the arrival of large language models that invite reliance with a fluency and confidence that no human expert could sustain, backed by competence that is real in some domains and fabricated in others, and that offer no signal, no hesitation, no seam that would allow the trusting party to tell the difference. O’Neill’s framework does not tell us whether to trust AI. It tells us, with the exactness the moment demands, what would need to be true for trust to be something other than credulity—and maps, with uncomfortable precision, how far we currently are from those conditions.

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI asks what it means to extend reliance on machines whose fluency and whose limits are both invisible. O’Neill supplies the cycle’s most exact philosophical vocabulary for that question. When the cycle invokes Byung-Chul Han’s concept of the smooth surface—the polished output that conceals its own construction—O’Neill translates the aesthetic observation into a moral argument: the smooth is not merely a cultural preference but a systematic failure of assessability, the condition she identifies as the foundation of trustworthy communication. When every claim is delivered in the same authoritative register regardless of its evidentiary basis, the audience loses the information she needs to calibrate her trust. That failure is not deception in the narrow sense—there is no intent to deceive—but it is functionally equivalent to deception in its effects on the audience’s capacity to make intelligent judgments.

The cycle’s argument that AI amplifies whatever signal it receives maps directly onto O’Neill’s analysis of autonomy. The machine that translates intention into output so rapidly and fluently that the evaluative process is bypassed does not enhance autonomy in the Kantian sense—the capacity to act on principles one has reflectively endorsed. It enhances the capacity to act, which is a different and lesser thing. The professional who produces AI-assisted work without examining whether the output reflects her considered judgment rather than her initial impulse is not acting more autonomously because she is acting more efficiently. She may be acting more efficiently in the wrong direction, amplified and polished to a professional sheen.

O’Neill also supplies the cycle’s most precise account of the accountability gap. When a language model produces a confident error—a fabricated legal case, a hallucinated pharmaceutical compound, a philosophical citation that does not exist—no one in the chain of reliance is accountable in the robust sense O’Neill’s framework requires. The machine cannot bear responsibility; accountability presupposes moral agency. The developer is accountable for systemic tendencies but not for individual outputs. The deployer is accountable for contextual appropriateness. The user is accountable for evaluative judgment. None of these is sufficient alone. And the smooth surface of AI output means the user is systematically placed in a position where exercising that evaluative responsibility requires expertise she turned to the tool precisely because she lacked.

Her insistence that accountability must attach to persons, not processes—that “responsible AI” frameworks specifying audits and reviews without identifying who bears the consequences of failure are compliance structures rather than accountability structures—is the cycle’s most direct answer to the governance question. The beaver builds the dam; the question is not whether the dam holds water but whether anyone is accountable if it does not.

Origin

Onora Sylvia O’Neill was born in 1941 and educated at Oxford and Harvard, where she studied philosophy under John Rawls. She has been Baroness O’Neill of Bengarve since 1999. Her career has bridged academic philosophy and public intellectual life in a way unusual for either domain: she served as President of the British Academy, chaired the Equality and Human Rights Commission, and chaired the Nuffield Council on Bioethics, in each case bringing the precision of Kantian practical reason to bear on concrete institutional questions.

Her philosophical work has concentrated on two interconnected themes: the foundations of Kantian ethics and their application to political and social philosophy, and the conditions of trustworthy communication and institutional accountability. Her books include Constructions of Reason (1989), A Question of Trust (2002), Autonomy and Trust in Bioethics (2002), and Linking Trust to Trustworthiness (2018). Her 2002 Reith Lectures, delivered on BBC Radio 4, reached an audience far beyond academic philosophy and established the trust-versus-trustworthiness distinction as a piece of public intellectual furniture.

The Kantian tradition she works within holds that the fundamental test of any principle of action is universalizability: can you will that everyone in relevantly similar circumstances act on the same principle without contradiction? This test is not merely a philosophical criterion; it is a trust-generating mechanism. An agent who acts on universalizable principles is, by that fact, an agent whose behavior is predictable in a specific way—not in the sense that you know exactly what she will do, but in the sense that you know she will not instrumentalize you. Trust, for O’Neill, is the rational response to this evidence. And the conditions that generate it are as specific and demanding as the principles themselves.

Key Ideas

Trust versus trustworthiness. O’Neill’s most consequential distinction: trust is what agents extend; trustworthiness is what institutions and persons must earn. A culture that celebrates trust as an unqualified good—that treats the mere act of trusting as virtuous regardless of its basis—has lost the conceptual equipment to distinguish between the wise and the gullible. The conditions of warranted trust are specific and demanding: competence, honesty (in the sense of assessability), and reliability. When one or more is absent, extending trust is not a virtue but a failure of practical reason. AI systems systematically fail to provide the evidence that would warrant trust, while providing, with exceptional facility, the surface features that human cognition uses as trust signals.

Assessability over sincerity. O’Neill argues, with characteristic precision, that the relevant standard for trustworthy communication is not sincerity—the requirement that speakers believe what they say, which is invisible and therefore unverifiable from outside—but assessability: the requirement that audiences be given adequate means to evaluate the claims they receive. This includes provision of evidence, identification of sources, acknowledgment of uncertainty, and avoidance of rhetorical devices that make weak claims appear stronger. AI output fails this standard systematically: it does not identify sources, does not distinguish between well-evidenced and poorly-evidenced claims, does not acknowledge uncertainty. The smooth confidence of its presentation is not dishonesty but its functional equivalent—an environment in which the distinction between warranted and unwarranted trust has been made invisible.

Principled autonomy and the amplifier. Autonomy, in O’Neill’s Kantian account, is not the capacity to do as one pleases but the capacity to act on principles one has reflectively endorsed. AI that translates intention into output so fluently that the evaluative space between wanting and doing collapses threatens not capability but the conditions of genuine self-governance. The professional who acts in the flow of AI-assisted production without examining whether the output reflects her considered judgment may be producing more, but she is governing herself less. Principled autonomy is not glamorous: it is the effortful, unglamorous discipline of reflective self-governance. The amplifier amplifies whatever it receives, including unreflective impulse.

Accountability must attach to persons. O’Neill distinguishes transparency from accountability: transparency gives the audience information, accountability gives the audience recourse. A “responsible AI” framework that specifies processes without identifying the persons who bear the consequences of failure is a compliance structure, not an accountability structure. The machine cannot bear responsibility; accountability requires moral agency. What must be built is a chain of responsibility in which the developer is accountable for systemic trustworthiness, the deployer for contextual appropriateness, and the user for evaluative judgment—a chain in which each link identifies specific persons facing specific consequences for specific failures.

Deception without intent and the smooth surface. O’Neill’s framework can identify, without attributing intention to a machine, a form of deception that is structurally equivalent to the intended kind in its effects on the audience. When AI output removes the conditions under which intelligent trust could be exercised—when the signals by which human cognition evaluates trustworthiness (confidence, fluency, apparent expertise) are systematically present regardless of whether the underlying content warrants them—the audience is placed in an epistemic environment optimized for credulity. The wrong is not in the machine but in the institutional design that deploys it without building the accountability structures that would make intelligent trust possible.

Explore more

Browse the full You On AI Field Guide — over 8,500 entries