CONCEPT
Model Collapse
The progressive deterioration of AI output quality when models are trained on their own previous output rather than on fresh human creative work — the informational equivalent of
soil depletion.
Model collapse is the technical name for a phenomenon that emerged as
large language models began producing a substantial fraction of the text circulating on the internet: when later generations of models train on data that includes their predecessors' output, quality degrades across generations. The distribution of generated text narrows. Rare phenomena —
the long tail of human linguistic creativity — disappear first. The output becomes more predictable, more generic, more confidently wrong about the edges of knowledge. The biological analog is inbreeding depression. The ecological analog is
soil depletion. In each case, the system loses vitality because the refresh mechanism has broken.
In The You On AI Field Guide
The phenomenon was documented formally by Shumailov et al. in a 2024 Nature paper titled 'The Curse of Recursion.' Training a series of language models on the output of previous generations produced rapid degradation: by the ninth generation, the models were producing incoherent output. Similar results have been found for image generation models