PERSON

Rosalind Franklin

The crystallographer who recovered the hidden architecture of matter from the patterns radiation leaves behind—producing Photo 51, the data that revealed the structure of DNA, before two men used it without her knowledge to win the century's most famous prize.

Rosalind Franklin is the right scientist for the age of artificial intelligence, and the reason has nothing to do with the injustice that made her famous. It is that she did, by hand, the exact thing our machines now do at scale: she inferred hidden structure from raw signal. A neural network takes a flood of data and extracts the pattern beneath it. Franklin took a flood of scattered X-rays and extracted the pattern beneath them—the same fundamental operation, performed in a basement at King's College London in 1952 and in a data center in 2025. Her most important data was shown to the people who became famous for the double helix without her knowledge or consent, and they built their discovery on it and did not credit her. We are now living through the largest uncredited use of human work in history: the training of AI systems on the writing, art, code, and photographs of billions of people who were never asked and never paid. Franklin is what that looks like for one person, in close-up. She refused to assert beyond the evidence—“we are not going to speculate; we are going to let the spots on this photograph tell us what the structure is”—and that refusal is the single most countercultural stance available in an age of machines constitutionally incapable of her restraint, that will always tell you something, that have no faculty for the honest we are going to wait. She poses two questions the machine age would rather skip: does extracting a pattern amount to understanding it, and who owns the data a discovery is built on?

In the [YOU] on AI Field Guide

The cycle that began with [YOU] on AI asks what it means to see the machine clearly—to understand what it actually does, where it falls short, and what it takes from those who did not consent to give. Franklin is the lens through which two of the cycle's deepest questions become precise. The first is the question of extraction without understanding: does recovering a pattern amount to grasping it? Franklin's discipline held the two apart with rigor no system since has matched. The second is the question of uncredited contribution: when a discovery is built on someone's work without their knowledge, what is owed, and by whom, and when?

AlphaFold, the system that predicts protein structure from amino acid sequence with accuracy rivaling experiment, is doing Franklin's job—recovering three-dimensional architecture from a one-dimensional signal—and doing it for proteins no crystallographer ever solved. The echo is exact and profound. But the echo also exposes the gap that is the whole question. Franklin could say why the structure was as she determined it: she could trace the result back to the physics of how the atoms scattered the rays, defend each feature of her model against alternatives, distinguish signal from artifact. The machine that predicts the fold produces the structure without producing the why. It has learned the mapping, not the mechanism—and can fail, silently and confidently, on a protein unlike those it trained on, with no internal signal that it has left familiar ground. The pattern is recovered. The understanding is still owed.

The training data question runs the same way. The large models that now write, draw, code, and converse were trained on enormous corpora of human work overwhelmingly without the knowledge, consent, or compensation of the people who made it. Every writer whose prose taught a language model to write, every artist whose images taught a diffusion model to paint, is in the position Franklin was in: their work was used, without their say, to build something that others now profit from. The scale is incomprehensibly larger—billions of uncredited contributors rather than one—but the moral architecture is identical. What makes Franklin's case instructive is that it is intimate enough to see clearly: one person, one photograph, one discovery, one erasure.

Franklin also stands in the cycle's gallery as the supreme empiricist—the scientist who trusted the data over the model, who measured where others guessed, who refused to assert beyond the evidence. That supreme empiricism turns out to be the standard our machines most conspicuously lack. A large language model does not conclude only what its input supports. It is built to produce a fluent, plausible continuation regardless of whether the evidence warrants it. Hallucination is the inversion of Franklin's principle: where she refused to assert beyond the evidence, the machine asserts beyond the evidence by default, because confident well-formed answers are what it was trained to produce.

Origin

Rosalind Elsie Franklin was born in London in 1920 into an Anglo-Jewish family, educated at Newnham College Cambridge—which awarded certificates rather than degrees to women at the time—and earned her doctorate at Cambridge for research on the microstructure of coal carried out during the Second World War. She then spent three years in Paris mastering X-ray diffraction under Jacques Mering, an experience she described as the most formative of her scientific life: she learned to coax structure from signal with a precision and patience that would define everything she subsequently produced.

At King's College London from 1951, she applied that mastery to DNA. She discovered that DNA existed in two distinct forms depending on humidity—A form and B form—and began the painstaking work of characterizing each. Photo 51, taken in May 1952 by her student Raymond Gosling under her direction, was an X-ray diffraction image of the B form: a photograph that, to a trained eye, showed unmistakably the signature of a helix. The image required not luck but technical mastery—Franklin's painstaking control of humidity, sample positioning, and exposure time. The quality of the data was the achievement.

In early 1953, Maurice Wilkins showed the photograph to James Watson without Franklin's knowledge. Separately, an unpublished report containing her precise crystallographic measurements reached Watson and Crick through a research council committee. Watson later wrote that the instant he saw Photo 51, his pulse began to race. The data Franklin had labored to produce became, without her say, the evidence that completed someone else's model. Their 1953 paper in Nature was published alongside Franklin's own paper, presented as supporting evidence for their model rather than as the source it partly was. Franklin moved to Birkbeck College and produced pioneering structural work on viruses before dying of ovarian cancer in 1958 at thirty-seven.

Key Ideas

Pattern extraction versus pattern understanding. The recovery of hidden structure from observable signal is the same problem in crystallography and in machine learning. A diffraction pattern is Franklin's input data; the crystal structure is her latent variable; the Fourier mathematics of diffraction is her inference procedure. She was, in a precise and non-trivial sense, doing the thing machine learning does. But her inference was disciplined by an understanding of physics. She knew why the spots fell where they fell; she could reason about what each feature of the pattern implied and what it did not. The machine finds the mapping that works without knowing why it works, and it cannot always tell the difference between a true regularity and a spurious one. Extraction is not understanding.

Let the spots tell us. Franklin's deepest principle was a refusal to let the model run ahead of the data. She would not assert beyond what the evidence strictly supported, and she held that position against the pressure of colleagues who wanted conclusions faster. This is not timidity but a method for not being fooled by your own data—a recognition that the data underdetermines the conclusion, and that a model adopted because it is beautiful will defend itself against the data that should kill it. Hallucination in large language models is the structural inversion of this principle: confidence decoupled from warrant, the appearance of knowledge without its substance.

The uncredited dataset. Franklin is the patron saint of the uncredited training set. The wrong done to her was not that Watson and Crick used her data—data is meant to be used; science advances by building on prior work. The wrong was the conjunction of three conditions: her work was used without her knowledge, without her consent, and without credit. Those three conditions are absent for every contributor to every AI training corpus. The defenders of the practice argue that scale makes individual consent impractical and that public availability implies permission; neither argument touches the moral structure Franklin's case exposes. Her data was available to those who used it, too; availability was never consent.

Provenance dissolved. In Franklin's case, the truth existed and could eventually be told: the provenance was real, and history could recover it, however late. In the AI case, there may be no fact of the matter to recover. When a model trained on a million writers produces a paragraph, there is no true answer to “whose work is this?”—it is all of theirs and none of theirs, in proportions that cannot be measured. The machine does not merely break the credit chain, as Wilkins and Watson did to Franklin; it abolishes the very possibility of a credit chain. Provenance has been dissolved rather than hidden. This is a genuinely new condition, darker than her case, and Franklin's case is the bridge to understanding it.

Rigor and reproducibility. Franklin's authority rested on the reproducibility of her work: her measurements were exact, her methods were careful, her results could be checked and would hold. The crisis quietly spreading through machine-learning science is that a great deal of it cannot be reproduced, cannot be checked, and does not hold. Models are trained on undisclosed datasets, with unreleased code, reporting results on benchmarks that may have leaked into training data. Franklin's whole authority—the thing that lets us say she was right about the water and the phosphates—is that her work could be checked. A science that abandons that standard for the seductions of the black box is abandoning the thing that made it science.

In the [YOU] on AI Field Guide

Origin

Key Ideas

Related Entries

Further Reading