Temperature is the engineering parameter that controls how much randomness a language model allows when selecting each next token. At low temperature, the model produces the most probable continuation—safe, predictable, matrix-conforming. At high temperature, the model wanders—producing sequences less probable and more likely to juxtapose elements the training distribution normally keeps separate. Read through Koestler's framework, temperature is the mechanized equivalent of the creative gradient itself: the continuum between rigid association within a single matrix and the frame-crossing that makes bisociation possible. The dial has made observable and adjustable the variable that Koestler described through historical reconstruction.
At low temperature, the machine is a pure associator. It produces outputs that conform strictly to the statistical regularities of its training within the matrix implied by the prompt. The code compiles, the email reads professionally, the summary captures the source faithfully. No matrices collide, because generation is constrained to the single matrix the prompt specifies. This mode is reliable, predictable, and creatively inert—the computational equivalent of what Koestler called 'the exercise of acquired skills on the same plane.'
At high temperature, the machine crosses matrices promiscuously. Words from one domain appear in the context of another. Connections proliferate between frames the training distribution normally keeps separate. Outputs become surprising, sometimes startling, occasionally incoherent. The incoherence signals that matrix-crossing has exceeded the threshold at which structural identity can be maintained—the outputs have left productive collision and entered random juxtaposition.
The zone between these extremes is the edge of chaos where genuine bisociation becomes possible. Divergent enough to introduce unexpected matrices, coherent enough that a human evaluator can determine whether structural identity has been revealed. This zone corresponds precisely to what Stuart Kauffman identified in complex systems as the region between order and disorder where the most interesting behavior occurs.
A direct and uncomfortable implication: techniques that reduce hallucination also reduce the probability of genuine bisociation. Retrieval-augmented generation, grounding mechanisms, and tighter output constraints all increase accuracy by decreasing divergence. But decreased divergence means decreased matrix-crossing, which means reduced bisociative potential. The practitioner using the machine for creative work must navigate this tension consciously—accepting that settings maximizing accuracy minimize creative potential, and vice versa.
Temperature in language models derives from statistical mechanics, where it describes molecular disorder. In the sampling step of language generation, temperature scales the logits before the softmax, flattening or sharpening the probability distribution over next tokens. The term entered machine learning through the connection between energy-based models and statistical physics.
Mechanized creative gradient. Temperature makes adjustable the variable that governs association versus bisociation.
Low temperature equals pure association. Strict matrix conformance produces reliable, predictable, creatively inert output.
High temperature equals dissolution. Excessive matrix-crossing produces incoherence rather than bisociation.
The edge of chaos. The middle zone is where matrix-crossings remain coherent enough to be evaluated—the computational equivalent of Kauffman's edge of chaos.
Tradeoff with accuracy. Techniques reducing hallucination reduce creative potential; the practitioner must manage the tension deliberately.