The information-theoretic analysis of natural language as the highest-bandwidth encoding system humans possess — near-optimal for propositional content, lossy below the entropy rate for embodied, aesthetic, and tacit knowledge.
Shannon's source coding theorem establishes that any source with entropy rate H can be compressed to H bits per symbol without loss, but compression below H inevitably destroys information. Natural language, as a compression format for human intention, is near-optimal for a specific class of information: propositional content, logical relationships, functional specifications. Its semantic bandwidth — carrying denotation, connotation, implication, context simultaneously — vastly exceeds the statistical entropy of the character sequence. But natural language cannot carry the full entropy of every dimension of human knowledge. Embodied intuition, aesthetic judgment, contextual expertise — these reside in patterns of experience that resist verbalization, with entropy rates exceeding what language can encode. The AI interface is therefore a highly efficient compressor for the compressible component of knowledge and a lossy compressor for the incompressible component.
Natural Language as Compression Format
In The You On AI Field Guide
Shannon's 1948 and 1951 experiments estimated the entropy of printed English at roughly one bit per character — reflecting the