CONCEPT

Marr's Three Levels of Analysis

The framework that any information-processing system—brain or machine—must be understood at three distinct levels: the computational (what problem is solved and why), the algorithmic (by what procedure), and the implementational (in what physical stuff), and that confusing them is the source of most bad theorizing about minds.

When David Marr set out to understand vision, he realized his field had not agreed on what kind of question it was asking. His response was a framework—perhaps the most disciplined one ever proposed for the science of mind—that separates three genuinely distinct levels of analysis. The computational level asks what problem a system solves and why that is the right problem: it is the level of the abstract task, of function, of the logic by which the correct output follows from the input. The algorithmic level asks how: what representation the system uses for inputs and outputs, and what procedure transforms one into the other. The implementational level asks in what physical stuff: neurons, transistors, gears. Marr’s central, unfashionable claim was that these levels are genuinely separate—answers at one do not substitute for answers at another—and that their dependence runs top-down: the computational constrains the algorithmic, which constrains the implementational, but not in general the reverse. The framework was forged to study the brain. It has become the sharpest instrument available for thinking clearly about large language models, because the most common errors in reasoning about AI are precisely errors about which level a claim belongs to. For the [YOU] on AI cycle, Marr’s three levels supply the diagnostic vocabulary the public conversation most conspicuously lacks.

In the [YOU] on AI Field Guide

The AI transition is generating a discourse that lurches between magic and menace because it lacks a vocabulary for the levels of analysis. Marr’s three levels supply that vocabulary. “It just predicts the next token” is an accurate computational-level description, but it does not settle whether the behavior produced by that computation deserves stronger description at the algorithmic level. “It produces fluent text and therefore understands” infers a computational fact from behavioral evidence, bypassing the demonstration that the mechanism matches the conclusion. “It is just matrix multiplication” is an implementational description with the force of “the brain is just chemistry”—true, and settling nothing above. Three different arguments about whether AI systems “really” understand are three arguments about different levels, conducted as though they were arguments about the same thing.

The framework also illuminates the interpretability crisis: the profound difficulty of explaining what trained neural networks actually do. We have mechanism without understanding—the implementational level fully exposed, the computational level almost entirely blank. Marr predicted this difficulty: the computation is not visible in the hardware, and trying to read the higher levels off the lower is the hard direction, the one he warned against. Every successful piece of mechanistic interpretability research is, in his vocabulary, the recovery of an algorithmic-level description from implementational data—climbing his ladder in the treacherous direction. And every honest interpretability researcher knows they have not yet recovered the computational level: what problem, exactly, is the whole system solving, and why?

Origin

Marr crystallized the three-level scheme in dialogue with Tomaso Poggio at MIT in the mid-1970s, building on ideas that had been forming since his earliest theoretical work on the cerebellum and neocortex. The framework appeared in mature form in Vision (1982), but its most cited illustration is disarmingly humble: a cash register. At the computational level, a cash register adds. At the algorithmic level, it represents numbers as decimal digits and carries when columns exceed nine. At the implementational level, it is made of cogs, or transistors, or anything that can count. The power of the illustration is its insistence that these are genuinely different questions with genuinely different answers—and that understanding addition does not require inspecting the transistors.

The framework was directed, polemically, against two dominant failure modes of Marr’s era. Neurophysiologists mapped the substrate without a theory of what the substrate was for. Early AI researchers wrote programs that recognized blocks on tables without asking whether their methods bore any relation to how seeing works. Both missed the aerodynamics by fixating on the feathers. The three levels were Marr’s prescription: before tracing the mechanism, specify the problem. The discipline is as urgently needed today as it was in 1976.

Key Ideas

Top-Down Dependence. The levels are not symmetric. The computational level is epistemically prior: it must be settled before the algorithmic level can be evaluated, and the algorithmic before the implementational. A scientist who does not know what problem a system is solving cannot judge whether a given algorithm solves it well, and cannot tell whether a piece of hardware is doing the right thing. This is why Marr thought the bottom-up instinct—study the mechanism first, hope the function emerges—was the one direction the inference does not reliably run. Modern AI inverts his recommended order completely, building systems by training and struggling, often years later, to discover what they compute.

Loose Coupling. Because the levels are separate, the same computation can be realized by multiple algorithms, and the same algorithm by multiple substrates. A brain and a GPU can perform recognizably similar computations without sharing any implementation. Convolutional neural networks discover oriented edge detectors in their earliest layers because the computational problem of vision constrains the right first representation—convergence that vindicates Marr’s deepest claim: the task has a shape, and that shape forces the solutions of any sufficiently good solver toward similarity.

The Confusion of Levels as Diagnostic. Marr’s framework is at its most practically useful as a diagnostic instrument. Implementational descriptions are always, in some sense, deflationary; the temptation to read them as refutations of higher-level claims is the level-crossing error that produces most bad AI arguments. Equally, behavioral resemblance does not establish algorithmic-level similarity: a system can produce the outputs of reasoning without performing the computation of reasoning, exactly as a system can produce the outputs of vision without reconstructing the 3-D world. The framework does not tell you the answer to any of these questions. It tells you which question you are asking.

What the Framework Cannot Reach. Marr’s three levels are a complete functional account of any information-processing system. They say nothing about consciousness—about whether there is anything it is like to be the system. A complete three-level account of human vision would not explain why there is something it is like to see red. This gap is precisely located by the framework: it is a gap between function and experience, and it is not at any of the three levels, which is why no answer at any level resolves it. The framework is the best map of what the third-person stance can deliver, and its boundary marks exactly where the first-person question begins.

Explore more

Browse the full You On AI Field Guide — over 8,500 entries