CONCEPT

Reward Prediction Error

Wolfram Schultz's 1990s discovery that dopamine neurons fire not at the reward itself but at the difference between expected and actual reward. The neural signal is a correction term, a learning update — and the mathematical template that seeded modern reinforcement learning.

Reward prediction error (RPE) is the quantity encoded by dopamine neurons in the ventral tegmental area: the difference between the reward an organism expected and the reward it actually received. Schultz's recordings in monkeys established the canonical pattern. On early trials, dopamine neurons fire when the reward arrives. As the monkey learns that a cue predicts the reward, the firing migrates backward in time — onto the cue — and goes silent at the reward itself. When a cue predicts reward but the reward fails to arrive, the neurons produce a brief dip below baseline at the exact moment the reward was expected. The neurons are encoding a prediction-error signal that drives learning. The discovery reshaped neuroscience and, through its mathematical identity with temporal difference learning algorithms, supplied the computational architecture of modern AI.

In The You On AI Field Guide

The RPE framework was greeted as an elegant unification: the

In The You On AI Field Guide

Keep reading with YOU ON AI