Regions of a neural network's loss landscape where small perturbations to parameters do not significantly affect performance — the computational realization of Wagner's biological robustness, and the topological signature of exploratory potential.
Flat minima are regions of parameter space where a neural network maintains its performance despite perturbations to its parameters. The discovery that models converging on flat minima generalize better than those converging on sharp minima — first suggested by Sepp Hochreiter and Jürgen Schmidhuber in 1997 and extensively validated in the deep learning era — finds its deepest explanation in Wagner's framework. Flat minima are the computational analog of biological robustness: configurations stable under perturbation, occupying connected regions of the landscape that provide adjacency to diverse alternative capabilities. The flatness is not merely a marker of reliability but of exploratory potential.
Flat Minima
In The You On AI Field Guide
The conventional explanation for why flat minima generalize better appeals to the minimum description length principle: flatter minima correspond to simpler models with shorter descriptions, which tend to generalize better on held-out data. This explanation is not wrong but is incomplete. Wagner's framework adds a deeper layer: flat minima occupy positions connected to a