Several developments pushed post-training into primacy. First, the base-model capability gap between frontier labs narrowed; all of them could produce a strong GPT-4-class base model by 2023–2024. Second, post-training techniques multiplied — RLHF, DPO, constitutional AI, reasoning-specific tuning, tool-use training — each opening capability-gain dimensions that were independent of base-model quality. Third, compute efficiency for post-training improved faster than for pretraining; a post-training pipeline that was prohibitive in 2022 became routine by 2024.
The economic implications are large. A company that has captured the pretraining frontier but not the post-training frontier now produces commodity base models that other labs' post-training pipelines can wrap into superior products. The strategic pressure has pushed every frontier lab to invest in post-training capability: data annotation infrastructure, specialized red-team operations, reasoning-trace generation, automated preference labeling. The post-training workforce at a frontier lab now rivals the pretraining team in size and exceeds it in leverage per dollar.
Reasoning models are the clearest case. OpenAI's o1 family, Anthropic's extended-thinking modes, DeepSeek's R1 — all are built on post-training pipelines that teach the model to produce long internal reasoning traces before answering. The underlying base models are comparable to non-reasoning models; the capability gain comes entirely from the post-training. The effect is dramatic enough that reasoning-model benchmark scores on mathematics, code, and science exceed non-reasoning models on the same base architectures by 20–40 percentage points.
The safety implications of post-training primacy are as important as the capability ones. Every safety property of a deployed model — refusal behavior, honesty, calibration, corrigibility, robustness to jailbreaks — is a post-training property. This is good news for safety research (the lever is clear) and uncomfortable news for safety assurance (the lever is a research process that is only partially understood and that varies between labs). Post-training primacy means that the distribution of safety across the industry is as much about post-training competence as about intent.
The thesis became visible with InstructGPT (Ouyang et al., 2022) demonstrating that post-trained small models outperformed un-post-trained larger models. It solidified with the Claude 3 family (2024) and OpenAI's o1 release (2024). The reasoning-model wave of late 2024–2025 made the thesis hard to argue against.
Base-model parity has narrowed frontier differentiation. Post-training is where labs now separate.
Reasoning capability is the clearest case. Extended-thinking post-training produces 20–40pt benchmark gains on comparable base models.
Safety is a post-training property. This is both a lever and an assurance challenge.
The workforce implication is significant. Post-training teams are now the frontier.