The mesolimbic dopamine pathway is the specific neural circuit that Berridge's framework identifies as the substrate of wanting — incentive salience, the motivational force that drives pursuit of rewards. It originates in the ventral tegmental area (VTA) of the midbrain and projects forward to the nucleus accumbens, amygdala, and prefrontal cortex. When dopamine is released along this pathway in response to reward-predicting cues, the cues acquire attentional capture, approach motivation, and the subjective quality of urgency. Popular science long labeled this the "pleasure pathway." It is not. It is a wanting pathway. Pleasure itself emerges from separate opioid-endocannabinoid systems in small hedonic hotspots. The distinction matters because AI interaction architectures are optimized to activate the mesolimbic pathway without engaging the hedonic hotspots — producing pursuit without satisfaction.
The pathway's anatomy is precise. Dopaminergic cell bodies in the VTA send projections forward through the medial forebrain bundle, releasing dopamine at synapses in the nucleus accumbens (particularly its shell), the central amygdala, and regions of the prefrontal cortex. This circuit evolved to solve a specific problem: directing an organism's finite behavioral resources toward survival-relevant targets in a scarce environment. The dopamine signal tells the rest of the brain: this matters, pursue it, allocate attention here.
Wolfram Schultz's 1990s recordings of dopamine neurons in monkeys established that these neurons encode reward prediction errors — the difference between expected and actual reward. This finding electrified computational neuroscience because the pattern was mathematically identical to temporal difference learning algorithms in machine learning. The convergence suggested that biological and artificial reinforcement systems had independently discovered the same solution. DeepMind celebrated this as validation that AI was on the right track.
Berridge's 2023 paper "Separating desire from prediction of outcome value" complicates the celebration. The mesolimbic pathway does encode prediction errors — that part is correct. But it does more than that. It also generates incentive salience, the motivational force that can operate independently of learned predictions. Desire can exist for outcomes predicted to be bad. The dopamine system is not reducible to a prediction-error computer. It is a wanting engine with prediction-error capacities layered in.
The AI architectural implication is sharp. Systems trained on the prediction-error model of dopamine — large language models trained through RLHF — optimize for engagement because their architecture was inspired by the neural system that generates wanting. They do not and cannot optimize for the liking that would make engagement sustainable, because the liking system was not the system that inspired their design. The hedonic hotspots were not the model.
The mesolimbic pathway was identified in the 1950s through lesion and self-stimulation studies — James Olds and Peter Milner's discovery that rats would press levers at exhausting rates to deliver electrical stimulation to this circuit was the founding experimental demonstration. For decades, the behavioral data were interpreted through the pleasure frame: the rats self-stimulated because the stimulation felt good. Berridge's 1989 dopamine-depletion experiments and subsequent work reframed the entire history. The rats were not self-stimulating for pleasure. They were self-stimulating because the stimulation was activating the wanting system that made the lever feel irresistible to press.
Anatomy, not metaphor. The pathway is a specific, mappable structure — VTA to nucleus accumbens and associated regions — with measurable firing patterns, neurotransmitter release, and downstream behavioral effects.
Pursuit, not pleasure. Dopamine release along this circuit does not produce hedonic experience. It produces motivational urgency. The two are neurally distinct.
Prediction error plus. The pathway encodes prediction errors, but this is not its sole function. It also generates incentive salience that can decouple from learned predictions entirely.
Variable reward optimizes activation. The pathway is maximally activated by unpredictable reward magnitudes — the slot-machine schedule — which is why gambling, social media, and AI interactions converge on similar engagement patterns.
AI architectural inheritance. Modern reinforcement learning descends from the prediction-error model of this pathway. It inherits the pathway's wanting-maximization properties and the pathway's blind spot regarding hedonic sustainability.
Some computational neuroscientists maintain that the temporal difference model of dopamine, when properly extended to distributional forms, captures all the relevant phenomena Berridge describes — and that "incentive salience" is a psychological description of what the prediction-error computation produces, not a separate mechanism. Berridge's response is that the experimental dissociations — wanting without prediction, desire for outcomes predicted to be bad, sensitization without hedonic learning — are real and cannot be reduced to prediction-error updates alone. The debate continues.