CONCEPT
Next-Token Prediction as Universal Objective
The single training objective — given a sequence, predict what comes next — that, scaled sufficiently, produced almost everything a
frontier language model can do. The most consequential methodological discovery of the AI decade.
Next-token prediction is the objective under which every large language model in production is trained: given a prefix of tokens, assign probability to the next token, and update parameters to increase the probability of the correct token. The loss is cross-
entropy; the task description fits in one sentence; the architecture around it (transformer with causal attention) is mundane. The surprise is that this objective, applied at sufficient scale to a diverse-
enough corpus, produces behavior that looks like reasoning,
translation, code generation, and conversation — none of which appear explicitly in the training signal. It is the contemporary field's working answer to the question of what general intelligence is made of, and the answer is unglamorous.
In The You On AI Field Guide
The philosophical content of the result is genuinely contested. One reading holds that any sufficiently rich corpus contains, in its distribution over next tokens, the full structure of the