WORK

Weidinger PNAS Study

The 2023 Proceedings of the National Academy of Sciences paper by Laura Weidinger and colleagues at Google DeepMind that operationalized the veil of ignorance as an experimental protocol — and found that participants behind the veil reliably chose principles prioritizing the least advantaged.

Across five incentive-compatible studies with over 2,500 participants, Weidinger and colleagues asked people to choose principles to govern an AI assistant. Some participants chose from behind a simulated veil — without knowledge of their relative position in the group. Others chose with full knowledge of their position. The result was consistent and robust across study variations: participants behind the veil showed a clear preference for principles instructing the AI to prioritize those worst-off. Neither risk attitudes nor political preferences adequately explained these choices. The preference appeared to be driven by elevated concerns about fairness — precisely what Rawls's framework predicts. The study is methodologically significant because it transformed the veil of ignorance from a philosopher's thought experiment into an empirically validated mechanism for aligning AI systems with principles of justice.

In the AI Story

Hedcut illustration for Weidinger PNAS Study — *Weidinger PNAS Study*

The study's design was elegant. Participants were placed in small groups and asked to choose principles that would govern an AI assistant's behavior in allocating tasks and resources. In the veiled condition, participants did not know what role they would occupy once the AI began operating. In the non-veiled condition, participants knew in advance whether they would be advantaged or disadvantaged by the AI's decisions. The contrast between the two conditions isolated the specific effect of the informational constraint that Rawls theorized.

The results vindicated Rawls's prediction with striking consistency. Participants behind the veil chose principles that protected the worst-off, even when doing so reduced expected benefits to themselves. Participants in the non-veiled condition chose principles that favored their own position, in the predictable manner of self-interested agents with full information. The gap between the two conditions was large and robust across variations in the experimental task, the population studied, and the specific framing of the choice.

The study's implications for AI alignment are substantial. The veil of ignorance, the authors argued, is not merely a philosophical ideal but a practical mechanism that can be implemented in the design of AI systems. An AI assistant whose principles were chosen under veil-like conditions — by participants who did not know how the AI's decisions would affect their own interests — would tend to be aligned with Rawlsian principles of justice rather than with the interests of its most powerful users. This is not a small matter. Current AI alignment practice typically aims at reinforcement learning from human feedback, where the feedback comes from annotators whose interests and positions are not systematically impartial. The veil-based approach represents a methodological alternative that has empirical support.

The study's limitations are also worth noting. Simulated veils are not actual veils. Participants in the experimental condition retained some knowledge of their own identities even as they were asked to reason as if ignorant of their positions. The transformation from laboratory to large-scale deployment raises questions about scalability, about the representativeness of the participants whose judgments constitute the training signal, and about the stability of veil-based principles as contexts change. None of these limitations invalidates the result, but they do constrain the scope of inferences that can be drawn from it.

Origin

The study was published in May 2023 in Proceedings of the National Academy of Sciences, with Laura Weidinger as first author. The Google DeepMind team that produced it included researchers who had previously worked on AI safety and alignment from technical angles; the study represented an unusual integration of empirical behavioral research with normative political philosophy.

The study was part of a broader wave of empirical work testing Rawlsian predictions — a line of research that traces back to Frohlich and Oppenheimer's 1990s experiments on distributive justice preferences but that has acquired new relevance as AI alignment has become a practical problem requiring empirical grounding.

Key Ideas

Experimental operationalization. The veil of ignorance can be implemented as an experimental protocol that measurably changes participants' choices about governing principles.

Robust maximin preference. Participants behind the veil reliably chose principles protecting the worst-off, even at cost to expected self-benefit.

Not explained by risk aversion or politics. The observed preference pattern cannot be reduced to pre-existing risk attitudes or political commitments — it appears to be driven by fairness considerations activated by the veil.

AI alignment implication. A veil-based alignment methodology could produce AI systems whose principles track Rawlsian justice rather than the preferences of their most powerful users.

Bridge between philosophy and empirical science. The study demonstrates that normative theories can be tested empirically — and that Rawls's theory, when tested, generates predictions that the evidence supports.

Debates & Critiques

The study has been criticized on several grounds. Its participant pool was drawn from standard experimental populations that may not represent the diversity of stakeholders affected by AI deployment. Its experimental tasks were simpler than real-world AI governance decisions. Its simulated veil was imperfect; participants retained enough information about themselves to contaminate the pure Rawlsian reasoning. The study's authors acknowledged these limitations and argued that the robustness of the effect across variations suggests that the underlying pattern is real, even if its application requires further empirical work. The debate continues, but the study has become an important reference point in discussions of AI alignment and normative AI governance.

Appears in the Orange Pill Cycle

John Rawls — On AI