Grading questions instead of answers names the pedagogical reorientation that responds to AI's commoditization of correct-answer production. When any student can produce a competent essay using an AI tool, the essay ceases to function as an assessment of understanding; it becomes an assessment of tool operation. By shifting assessment from essay production to question formulation, the teacher restores the evaluative function the technology disrupted. A good question requires understanding what one does not understand — a more demanding cognitive operation than demonstrating what one does understand.
There is a parallel reading that begins from the material conditions of educational assessment rather than pedagogical ideals. The shift to grading questions assumes an infrastructure of evaluation that doesn't exist and may never exist at scale. Question quality assessment requires intensive human judgment—each question must be evaluated for its contextual appropriateness, conceptual depth, and originality. This creates a labor bottleneck precisely when educational institutions are under pressure to reduce costs and standardize outcomes. The teacher grading questions instead of essays hasn't solved the AI problem; she's created an artisanal workaround that depends on her individual expertise and available time.
The deeper issue is that educational systems optimize for what they can measure efficiently, not what matters pedagogically. Standardized testing infrastructure, credentialing requirements, and inter-institutional comparability all depend on reproducible assessment of standard outputs. A student's college application cannot include "asked excellent questions about thermodynamics"—it must include comparable metrics. The question-grading teacher operates in a pocket of pedagogical freedom that exists only because her institution hasn't yet standardized around AI-assisted production. Once it does, her method will be seen not as innovation but as non-compliance. The real dynamic isn't about recovering authentic assessment from AI's disruption; it's about watching educational institutions choose efficiency over pedagogy, as they always have. The students who benefit from question-based assessment will be those already privileged enough to access boutique educational experiences, while the majority will be processed through systems that embrace rather than resist AI's commoditization of understanding.
Segal describes the paradigmatic case in The Orange Pill: a teacher who stopped grading her students' essays and started grading their questions. She gave the class a topic and an AI tool. The assignment was not to produce an essay but to produce the five questions one would need to ask before writing an essay worth reading. The students who produced the best questions demonstrated the deepest engagement with the material.
A good question cannot be generated by prompting an AI tool, because generating a good question requires the specific evaluative capacity that prompting produces the appearance of without the substance. The question-based pedagogy develops the capacity that AI cannot substitute for: the evaluative judgment to assess whether an output is substantively correct, appropriately contextualized, and responsive to the actual complexity of the problem it addresses.
The approach inherits from the Socratic method, the Oxford tutorial, and the ars interrogandi of Renaissance humanism. These pedagogies share the conviction that the quality of inquiry is more diagnostic of understanding than the quality of production — a conviction that centuries of answer-focused industrial education obscured but that AI has made unavoidable.
The institutional adoption will be slow. Grading rubrics have not been redesigned for question evaluation; standardized assessments measure output quality. The teacher who adopts the pedagogy works against the grain of her institution. But her innovation serves as a prototype — the armorer working at the bench while the factory is being designed.
The pedagogical move is ancient, inherited from Socratic dialogue and monastic scholarship. Its AI-era revival appears in Segal's The Orange Pill and in educational research on AI-resistant assessment emerging in 2024–2026.
Questions reveal comprehension. Formulating a good question requires identifying what one does not understand — a harder operation than demonstrating what one does.
AI cannot substitute. Prompting produces the appearance of evaluative capacity without the substance; question-grading requires the substance.
Ancient pedagogy, new urgency. The Socratic tradition supplies the method; the AI moment supplies the necessity.
Working against the metrics. Institutional adoption lags individual innovation, leaving the innovating teacher unsupported.
The fundamental tension between these views centers on feasibility at different scales. For the individual classroom dynamic that Segal describes, the optimistic framing is 90% correct—a skilled teacher can indeed restore evaluative authenticity by grading questions rather than outputs. The method works precisely because it exploits what AI currently cannot do: formulate genuinely novel inquiries that demonstrate comprehension gaps. At this scale, the pedagogical innovation represents a genuine solution to AI's disruption of traditional assessment.
When we shift focus to institutional adoption and systemic change, the contrarian view gains ground (70% correct). The infrastructure argument is compelling: educational systems are path-dependent, locked into assessment methods that prioritize comparability and efficiency. The labor intensity of question evaluation creates a genuine barrier—not every teacher has the expertise or time to assess question quality, and training them would require resources institutions are unlikely to provide. The boutique-versus-mass education divide the contrarian identifies is real and will likely widen.
The synthesis requires recognizing that both views are describing different phases of the same transition. Question-grading represents what education could become if we rebuilt assessment infrastructure around AI-era realities—this is the long-term possibility Segal envisions. But the path from here to there runs through the political economy of education that the contrarian describes. The most likely outcome is a bifurcated system: elite institutions and innovative teachers will adopt question-based and other AI-resistant assessments, while mass education will accept AI-assisted production as the new normal. The pedagogical principle is sound; the implementation will be uneven. The real work lies not in perfecting the method but in creating the conditions for its adoption beyond experimental pockets.