The AI transition is generating a discourse that lurches between magic and menace because it lacks a vocabulary for the levels of analysis. Marr’s three levels supply that vocabulary. “It just predicts the next token” is an accurate computational-level description, but it does not settle whether the behavior produced by that computation deserves stronger description at the algorithmic level. “It produces fluent text and therefore understands” infers a computational fact from behavioral evidence, bypassing the demonstration that the mechanism matches the conclusion. “It is just matrix multiplication” is an implementational description with the force of “the brain is just chemistry”—true, and settling nothing above. Three different arguments about whether AI systems “really” understand are three arguments about different levels, conducted as though they were arguments about the same thing.
The framework also illuminates the interpretability crisis: the profound difficulty of explaining what trained neural networks actually do. We have mechanism without understanding—the implementational level fully exposed, the computational level almost entirely blank. Marr predicted this difficulty: the computation is not visible in the hardware, and trying to read the higher levels off the lower is the hard direction, the one he warned against. Every successful piece of mechanistic interpretability research is, in his vocabulary, the recovery of an algorithmic-level description from implementational data—climbing his ladder in the treacherous direction. And every honest interpretability researcher knows they have not yet recovered the computational level: what problem, exactly, is the whole system solving, and why?
Marr crystallized the three-level scheme in dialogue with Tomaso Poggio at MIT in the mid-1970s, building on ideas that had been forming since his earliest theoretical work on the cerebellum and neocortex. The framework appeared in mature form in Vision (1982), but its most cited illustration is disarmingly humble: a cash register. At the computational level, a cash register adds. At the algorithmic level, it represents numbers as decimal digits and carries when columns exceed nine. At the implementational level, it is made of cogs, or transistors, or anything that can count. The power of the illustration is its insistence that these are genuinely different questions with genuinely different answers—and that understanding addition does not require inspecting the transistors.
The framework was directed, polemically, against two dominant failure modes of Marr’s era. Neurophysiologists mapped the substrate without a theory of what the substrate was for. Early AI researchers wrote programs that recognized blocks on tables without asking whether their methods bore any relation to how seeing works. Both missed the aerodynamics by fixating on the feathers. The three levels were Marr’s prescription: before tracing the mechanism, specify the problem. The discipline is as urgently needed today as it was in 1976.
Top-Down Dependence. The levels are not symmetric. The computational level is epistemically prior: it must be settled before the algorithmic level can be evaluated, and the algorithmic before the implementational. A scientist who does not know what problem a system is solving cannot judge whether a given algorithm solves it well, and cannot tell whether a piece of hardware is doing the right thing. This is why Marr thought the bottom-up instinct—study the mechanism first, hope the function emerges—was the one direction the inference does not reliably run. Modern AI inverts his recommended order completely, building systems by training and struggling, often years later, to discover what they compute.
Loose Coupling. Because the levels are separate, the same computation can be realized by multiple algorithms, and the same algorithm by multiple substrates. A brain and a GPU can perform recognizably similar computations without sharing any implementation. Convolutional neural networks discover oriented edge detectors in their earliest layers because the computational problem of vision constrains the right first representation—convergence that vindicates Marr’s deepest claim: the task has a shape, and that shape forces the solutions of any sufficiently good solver toward similarity.
The Confusion of Levels as Diagnostic. Marr’s framework is at its most practically useful as a diagnostic instrument. Implementational descriptions are always, in some sense, deflationary; the temptation to read them as refutations of higher-level claims is the level-crossing error that produces most bad AI arguments. Equally, behavioral resemblance does not establish algorithmic-level similarity: a system can produce the outputs of reasoning without performing the computation of reasoning, exactly as a system can produce the outputs of vision without reconstructing the 3-D world. The framework does not tell you the answer to any of these questions. It tells you which question you are asking.
What the Framework Cannot Reach. Marr’s three levels are a complete functional account of any information-processing system. They say nothing about consciousness—about whether there is anything it is like to be the system. A complete three-level account of human vision would not explain why there is something it is like to see red. This gap is precisely located by the framework: it is a gap between function and experience, and it is not at any of the three levels, which is why no answer at any level resolves it. The framework is the best map of what the third-person stance can deliver, and its boundary marks exactly where the first-person question begins.