When ChatGPT can produce a competent essay on any topic in thirty seconds, the essay as a measure of student learning becomes meaningless. The essay measures second-order competence — the capacity to assemble, organize, and present — which AI now provides on demand. The question measures first-order capacity: the ability to identify what one does not know, to articulate the gap between existing understanding and the material, to open a space that requires genuine engagement. A teacher described in The Orange Pill stopped grading essays and started grading questions. She gave students a topic and an AI tool. The assignment was to produce the five questions the student would need to ask before she could write an essay worth reading. The shift is a fundamental reorientation of what education measures, and it maps directly onto Peter Elbow's first-order/second-order distinction. Good questions cannot be outsourced to AI, because they depend on the specific configuration of a particular student's prior knowledge, confusions, and biographical relationship to the material. An AI can generate lists of possible questions. It cannot generate her question — the one that emerges from the intersection of this material and this mind.
The practice validates Elbow's lifelong argument that education should develop thinking, not artifacts. The essay was always meant to be evidence of thinking, but institutional constraints — the need for scalable assessment, the five-paragraph template, the rubric that specifies exactly what a passing essay must contain — converted the essay from evidence into target. Students learned to produce essays that satisfied the rubric without undergoing the thinking the essay was supposed to represent. John Warner called these 'simulations of academic artifacts,' and AI revealed that they were always simulations. The machine produces them faster and better, exposing that the artifact was never measuring what education claimed to measure.
Grading questions instead measures the gap between what the student knows and what the material contains. A student who asks 'Why does this argument about democratization ignore the infrastructure gap in developing nations?' has noticed the gap, which requires having engaged with both the argument and the counterargument, which requires first-order process. The question evidences genuine cognitive work that cannot be delegated. The teacher who can distinguish between questions that emerge from genuine engagement and questions that simulate engagement — by knowing the student, by recognizing the marks of her specific struggles with the material — is performing the experiential evaluation that Elbow's teacherless writing groups were designed to develop.
The broader implication extends to every domain where evaluation determines outcomes. The job interview that tests presentation skills is now testing, in part, AI-prompting skills. The bar exam that tests memorandum-drafting is now testing the ability to describe legal problems to a machine. The medical board that tests diagnostic reasoning is now testing the ability to evaluate machine-generated differentials. In each case, the artifact no longer correlates reliably with the thinking it was supposed to evidence. The institution that adapts by changing what it measures — from artifacts to questions, from products to processes, from second-order competence to first-order capacity — is building the institutional dam the AI moment requires.
The pedagogy that supports question-grading must teach questioning as a disposition, not a technique. The disposition to question is the disposition to notice when understanding breaks down, to sit with the discomfort of not knowing, to resist the temptation to reach for the machine that will fill the gap before the student has formulated what she does not understand. Freewriting is the method: the practice of writing without stopping surfaces the questions the student did not know she had — the gaps in understanding, the half-formed intuitions, the connections that surprise her. The garbage draft produces, alongside the garbage, the moments of genuine cognitive engagement that reveal what the student actually thinks as opposed to what she thinks she is supposed to think.
The specific practice of grading questions rather than essays emerged independently among multiple teachers facing the AI moment in 2024–2025. The Orange Pill documents one such teacher without naming her, suggesting the practice represents a collective discovery rather than an individual innovation. The theoretical foundation, however, is clearly Elbowian: the shift from measuring products to measuring processes, from evaluating second-order competence to evaluating first-order capacity, from rewarding polish to rewarding the evidence of genuine thinking.
Questions evidence engagement. A good question demonstrates that the student has noticed a gap in her understanding — it cannot be produced without first-order encounter with the material.
Questions resist outsourcing. AI can generate lists of possible questions about any text but cannot generate the specific question that emerges from a particular student's particular confusions and prior knowledge.
The artifact became the target. Traditional essay pedagogy converted evidence of thinking into a template to be filled — AI revealed that students were producing simulations all along.
Assessment redesign is institutional dam-building. Changing what institutions measure — from essays to questions, from presentations to live reasoning, from code to architectural judgment — protects the first-order space where thinking develops.
Pedagogy must teach the questioning disposition. Not techniques for generating questions but the willingness to sit with not-knowing, to resist premature answers, to notice when understanding breaks down.