The program's structure reflected an insight that much subsequent AI research has struggled to internalize: explanation and understanding are not the same thing. A system can generate an explanation that is technically accurate and cognitively useless — telling the user which variables contributed to the prediction without helping the user understand why those variables matter, how the system would behave if the situation changed, or what kinds of errors the system is prone to. Klein's team focused on the gap between these two, developing frameworks for what users actually need in order to form accurate mental models of system behavior.
The program's outputs include the AIQ (Artificial Intelligence Quotient) toolkit, a set of non-algorithmic assessment instruments designed to help users identify the boundaries of AI system competence. Klein framed the name deliberately — the goal was not to measure the AI's intelligence but to raise the user's IQ about the AI systems they wrestle with. The toolkit moves beyond local explanations toward global competence mapping, supporting the construction of the mental models that calibrated trust requires.
The program had mixed influence on the broader AI field. Its technical work on generating explanations influenced subsequent research in interpretable machine learning, but its deeper philosophical contribution — that effective oversight requires cognitive resources different from technical transparency — was largely absorbed into human-factors research rather than becoming central to mainstream AI development. The gap between what DARPA XAI established and what production AI systems actually provide remains a significant structural feature of the field.
Klein's subsequent writing on AI has drawn extensively on lessons from the program, particularly the recognition that AI explanation designed by AI researchers tends to satisfy AI researchers rather than the domain experts who need to oversee AI outputs. The insight connects to his broader research program on expertise: effective oversight depends on experiential foundations that AI explanations alone cannot provide.
DARPA announced the XAI program in 2016 as a response to the growing recognition that the 'black box' character of deep learning systems was impeding military adoption. Program manager David Gunning led the effort, which ran until approximately 2021 and produced dozens of technical papers, evaluation frameworks, and demonstration systems.
Klein's involvement began at the program's inception and extended through its conclusion, making him one of the few non-AI-researcher principal investigators with sustained influence on program outputs. The institutional setup — a cognitive psychology team alongside AI research teams — was itself an unusual recognition that the problem the program was addressing was not purely technical.
Explanation versus understanding. The program's central insight was that technical transparency and effective oversight are related but distinct.
User-centered assessment. The AIQ toolkit provided non-algorithmic instruments for users to map AI competence boundaries.
Mental model construction. Effective oversight requires users to build global models of system behavior, not only local explanations of specific outputs.
Failure-mode exposure. Users need experience with system failures to develop calibrated trust.
Cognitive over technical framing. The program demonstrated that AI oversight problems are substantially human-factors problems, not only engineering problems.