The idea-expression distinction is the load-bearing structural feature of modern copyright. You cannot copyright a fact, a concept, an argument, or an idea. You can copyright the specific language in which that fact is stated, the concept developed, the argument made. The distinction was articulated most rigorously by Fichte in 1793 and has governed copyright jurisprudence ever since. Its doctrinal elegance enabled two and a half centuries of intellectual property law. Its ontological foundation — the claim that form bears the unique imprint of an individual mind — is what AI has now called into question.
The distinction performs essential functional work. Without it, copyright would either protect too little (leaving writers unable to prevent the pirating of their works) or too much (allowing writers to monopolize ideas themselves, chilling subsequent expression). By drawing the line at form, copyright secures the writer's economic interest without sequestering the intellectual common ground on which future work must build.
In American jurisprudence the doctrine is codified at 17 U.S.C. § 102(b), which excludes from copyright protection any idea, procedure, process, system, method of operation, concept, principle, or discovery. The codification formalized what common-law decisions had established over the preceding century and a half.
AI disturbs the doctrine by producing form without the individual mind the doctrine presupposes. The form is statistical — the aggregate pattern extracted from millions of texts — and the distinction between form (protected) and idea (unprotected) cannot perform its sorting function when the form is itself a statistical aggregate of many forms. The Romantic authorship construct that legitimated the doctrine has not merely become philosophically debatable; it has become operationally unworkable.
The training corpus question presses the doctrine at a different point. When a model is trained on copyrighted works, is it using their ideas (permitted) or their expression (not permitted)? The statistical learning process operates on both simultaneously, extracting patterns that are neither purely idea nor purely expression. The existing doctrine has no clean answer, and the cases now making their way through multiple jurisdictions will determine whether the doctrine can be extended to cover the new technology or whether it must be replaced.
The distinction's roots are older than Fichte — versions appear in seventeenth-century legal and philosophical writing — but Fichte's 1793 essay gave it the systematic philosophical articulation that subsequent jurisprudence would absorb. The specifically German grounding of the distinction in the metaphysics of Geist was translated into Anglo-American common-law reasoning over the nineteenth century, with key cases (Baker v. Selden, 1879; Nichols v. Universal Pictures, 1930) progressively clarifying its operational meaning.
Ideas belong to everyone. The doctrine's foundational concession: once an idea is published, it enters the common stock of thought. No subsequent writer owes royalties for using it.
Expression belongs to one. The specific form in which ideas are expressed becomes property, anchored in the supposed uniqueness of the individual mind that produced it.
The line is doctrinal, not natural. Where idea ends and expression begins is a question courts answer case by case. The line is stable enough to support the system but has always been contested at the margins.
Form requires mind. The doctrine's ontological premise — that form is the unique expression of an individual consciousness — is what AI has rendered untenable.
Possible reconstruction. Post-Romantic copyright regimes will likely require new categories — perhaps distinguishing creative investment (protected) from statistical extraction (subject to different rules) — to replace the failing idea-expression binary.
Legal scholars have long debated whether the distinction is genuinely coherent or merely pragmatic. Critics argue that ideas and expression cannot be cleanly separated — that the specific form of an argument is constitutive of the argument itself. Defenders argue that the doctrine's fuzziness at the margins does not undermine its work at the core, and that the alternative (protecting ideas) would be worse. AI intensifies the critique: when form is generated statistically, the fuzziness at the margin swallows the core.