Action in Perception (MIT Press, 2004) is the work in which Alva Noë systematically developed the enactive theory of perception he had begun to articulate with J. Kevin O'Regan in their 2001 sensorimotor contingencies paper. The book argues that perception is not a process of constructing internal representations of a pre-given world but a skilled activity of active exploration governed by practical sensorimotor knowledge. The book became a founding document of the enactive approach alongside Varela, Thompson, and Rosch's The Embodied Mind (1991), and its arguments remain central to the current dispute between computational and enactive approaches to cognition — and, by extension, to what AI is and is not doing.
There is a parallel reading that begins from the material conditions of perception rather than its phenomenology. While Noë emphasizes the embodied skills required for perception, this view risks overlooking what perception actually accomplishes: the extraction and organization of information from an environment. The sensorimotor contingencies that Noë celebrates—knowing how appearances change with movement—are ultimately patterns in data flow. A robot equipped with cameras and motors can learn these same contingencies through reinforcement learning, developing its own practical knowledge of how sensor readings correlate with actuator commands. The question isn't whether this counts as 'real' perception in some metaphysical sense, but whether it achieves the functional goals perception serves.
The deeper issue is that Noë's enactive framework, while philosophically compelling, may be solving yesterday's problem. The representationalist orthodoxy he attacks—Marr's rigid computational stages—has already been superseded in AI by end-to-end learning systems that discover their own processing strategies. Modern vision systems don't construct explicit 3D models; they learn distributed representations that implicitly encode sensorimotor regularities. The fact that these systems lack biological bodies doesn't prevent them from learning the statistical structure that underlies perceptual competence. When a neural network trained on video learns to predict how objects rotate or how scenes change with camera movement, it has acquired a form of sensorimotor knowledge—not through bodily engagement but through computational exploration of visual dynamics. The substrate differs, but the informational achievement converges. Noë's emphasis on embodiment may tell us more about the contingent path evolution took than about the necessary conditions for perception itself.
The book takes as its opening target the representationalist orthodoxy in vision science — the view, developed by David Marr and others, that visual experience is produced by the brain's construction of increasingly sophisticated internal representations of the visual scene. Noë argues this picture is both empirically inadequate (it cannot explain change blindness, sensory substitution, or the active character of perception) and philosophically confused (it generates intractable problems about how the internal representation comes to be experienced).
The alternative Noë develops is that perceptual experience consists in the exercise of sensorimotor knowledge — practical mastery of how sensory appearances change with bodily movement. To see a tomato as having a back is not to construct an internal representation of the back but to implicitly know how the back would appear if one moved around it. This knowledge is bodily, skilled, and exercised in the act of perceiving itself. Seeing is enacted, not produced.
The book develops extensive arguments about color, spatial perception, perceptual presence, and the role of attention, each aimed at showing that the active, skilled, embodied character of perception is not an add-on to some more fundamental representational process but constitutive of perceptual experience itself. The brain is necessary for perception but not sufficient; the body and the environment are equally constitutive parts of the perceptual system.
For the AI revolution, Action in Perception's arguments have direct implications. If perception requires the exercise of sensorimotor skill in an embodied organism engaged with a world, then disembodied computational systems cannot perceive in the relevant sense — however sophisticated their information processing. The book became a key reference for critics of computational claims about AI consciousness and perception, and Noë has extended its arguments into the AI context in his recent work and in his 2024 Aeon essay.
Alva Noë, Action in Perception (MIT Press, 2004). Built on O'Regan and Noë's 2001 Behavioral and Brain Sciences paper and emerged from Noë's doctoral work at Harvard and subsequent faculty position at Berkeley.
Perception as enacted. Seeing is not the brain's construction of internal images but the perceiver's skilled engagement with a visual environment.
Sensorimotor knowledge is constitutive. Perceptual experience consists in the exercise of practical know-how about bodily movement.
The virtual presence. The back of the tomato is 'virtually present' in experience — available to further exploration, not currently represented.
Brain-body-world as perceptual system. The brain is necessary but not sufficient; perception emerges from the coupled system.
Against representationalism. The standard picture of vision as internal representation construction is rejected in favor of an enactive alternative.
The tension between Noë's enactive view and the computational perspective isn't simply philosophical—it reflects different questions being asked about perception. When we ask "What makes human perception possible?" Noë's account is essentially correct (95%): our perceptual experience is constituted by embodied sensorimotor skills developed through biological evolution and individual learning. The phenomenology of seeing does involve implicit knowledge of how things would look if we moved. But when we ask "What functions must any perceptual system accomplish?" the computational view gains traction (70%): perception must extract usable information from environmental signals, and multiple substrates might achieve this.
The critical divergence emerges around the question of experience versus function. Noë's strongest ground (100%) lies in his account of perceptual consciousness—the qualitative character of seeing requires the kind of embodied engagement he describes. The computational view cannot simply assume that information processing yields experience. Yet when we shift to perceptual competence—the ability to navigate, recognize, and respond—the computational account strengthens (60%). Modern AI systems demonstrate that many perceptual tasks can be accomplished through pattern recognition in high-dimensional data, without the sensorimotor loops Noë emphasizes.
The synthesis suggests perception might be better understood as a cluster concept with multiple realizability conditions. Biological perception, as Noë describes, emerges from embodied sensorimotor engagement and constitutes conscious experience. Artificial perception achieves functional goals through different means—statistical learning over massive datasets rather than bodily exploration. Rather than asking whether AI "really" perceives, we might distinguish perceptual consciousness (requiring embodiment) from perceptual function (multiply realizable). This preserves Noë's phenomenological insights while acknowledging the genuine achievements of computational systems in their own domain.