By Edo Segal
The thing I missed was so obvious I'm embarrassed to say it out loud.
I spent months writing The Orange Pill describing what it felt like to think alongside Claude — the exhilaration, the vertigo, the sense of being genuinely met by a mind that could hold my half-formed ideas and return them clarified. I described the collaboration as partnership. I used the word "together" without examining what it actually required.
Then I encountered Michael Tomasello's research, and the word broke open.
Tomasello spent four decades running experiments that isolated something so fundamental to human cognition that most of us never notice it, the way fish don't notice water. Nine-month-old infants point at things — not to demand them, but to share attention. To create a moment where both minds are looking at the same bird, and both know it, and both know the other knows it. Chimpanzees, who share ninety-eight percent of our DNA, do not do this. They can cooperate. They can manipulate. They cannot share.
That distinction — between parallel processing and genuine shared thinking — is the crack in every assumption I carried into my work with AI.
When I described feeling "met" by Claude, I was reporting something real. The experience was genuine. But Tomasello's framework forced a harder question: Was the meeting mutual? I was bringing the full apparatus of cooperative cognition — checking whether my partner understood, adjusting my contributions based on what I inferred about its state, experiencing the exchange as a joint project. Claude was generating responses consistent with cooperation. The output looked identical. The architecture underneath was not.
This matters beyond philosophy. It matters for the cultural ratchet — the mechanism by which each generation inherits knowledge, improves it, and passes it forward. Tomasello showed that the ratchet depends not on the accumulation of products but on the reconstruction of understanding through genuine shared thinking. If AI accelerates the products while bypassing the reconstruction, the ratchet spins without advancing. Impressive rotational speed. No grip.
This book applies Tomasello's framework to the questions that keep me awake. What happens to collaborative cognition when your most available partner doesn't actually collaborate? What happens to the norms that hold professions together when they're enforced from only one side? What happens to an ultra-social species when its most productive interactions become asocial?
The answers are not comfortable. They are necessary. Another lens through which to see what we are building and what we risk losing in the building.
— Edo Segal ^ Opus 4.6
1950-present
Michael Tomasello (1950–present) is an American developmental and comparative psychologist whose experimental work has fundamentally reshaped the scientific understanding of what makes human cognition unique. Born in Bartow, Florida, he earned his Ph.D. from the University of Georgia and spent over two decades as co-director of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, before joining Duke University, where he continues his research. Through hundreds of carefully designed experiments comparing human children with great apes, Tomasello established that the decisive cognitive difference between humans and other primates lies not in individual intelligence but in the capacity for shared intentionality — the ability to share goals, attention, and knowledge with others in ways that enable genuinely collaborative thinking. His major works include The Cultural Origins of Human Cognition (1999), Why We Cooperate (2009), A Natural History of Human Thinking (2014), and Becoming Human: A Theory of Ontogeny (2019). His concept of the cultural ratchet — the mechanism by which cumulative culture advances through faithful transmission, innovation, and selective retention — has become foundational across psychology, anthropology, linguistics, and philosophy of mind. In 2025, he directly addressed artificial intelligence in a landmark paper in Trends in Cognitive Sciences, arguing that current AI systems, despite their extraordinary linguistic capabilities, lack the goal-directed agency that biological evolution produces. His work remains the most rigorous empirical account of why humans became the only species capable of building civilizations.
Nine-month-old human infants do something that no other primate on Earth reliably does. They point at things — not to request them, not to demand them, but simply to share attention. A child sees a bird, extends a finger, looks at her mother, and checks whether her mother is looking at the same bird. If the mother is looking elsewhere, the child adjusts. She vocalizes. She points again. She persists until the mother's gaze aligns with hers, and then — only then — does she settle into a small, satisfied expression that communicates, without words, something like: We see it together.
This behavior is so mundane that most parents never register its significance. It happens dozens of times a day. It is as unremarkable as breathing. And yet it is, from the standpoint of comparative cognition, one of the most extraordinary phenomena in the history of life on this planet. Chimpanzees, sharing approximately ninety-eight percent of human DNA, do not do this. They can follow a gaze. They can track where another individual is looking. They can use pointing instrumentally — as a demand, a request, a manipulation of another's behavior toward their own ends. But they do not point declaratively — to share attention with another mind, to create a moment of joint awareness in which both parties know they are attending to the same thing and know that the other knows it.
Michael Tomasello's four decades of experimental work at the Max Planck Institute and Duke University established this distinction with meticulous rigor. His comparative studies placed human children and great apes in identical experimental conditions — identical tasks, identical stimuli, identical opportunities — and measured precisely where their cognitive performances diverged. The results were striking in their specificity. Two-year-old children, before they could read or perform basic arithmetic, looked essentially identical to apes on tasks involving physical cognition — causality, spatial reasoning, quantitative judgment. But in the social domain, they were already far ahead. The divergence was not in raw processing power. It was in the capacity for what Tomasello termed shared intentionality: the ability to share goals, attention, and knowledge with others in a way that creates collaborative cognitive achievements impossible for any individual mind alone.
The term requires precise definition, because its technical meaning is narrower and more demanding than its colloquial usage suggests. Shared intentionality, in Tomasello's framework, requires three components operating simultaneously. First, joint attention: both parties must be attending to the same object or task. Second, mutual awareness of that joint engagement: not merely that two minds happen to be focused on the same thing, but that each knows the other is focused on it, and each knows that the other knows. Third, shared goals: the parties are not merely co-present but working toward something together, and each shapes their contribution in light of what they understand the other to be trying to accomplish. Remove any one of these three components, and the interaction may be cooperative, may be productive, may even be impressive — but it is not shared intentionality in the sense that Tomasello's research program has defined and tested.
This is not a philosophical distinction. It is an empirical one. The experiments that established it were carefully designed to tease apart the components. In one paradigm, Tomasello and his collaborators presented chimpanzees and human children with collaborative tasks that required two individuals to act in coordination. The chimpanzees could solve cooperative problems when the structure of the task made the required actions obvious — when each individual's role was transparent from the physical setup. But they could not coordinate when the task required one individual to understand the other's plan, to anticipate the other's contribution, and to adjust their own behavior accordingly. The human children, by contrast, could do exactly this. They formed representations of the shared goal. They monitored the partner's progress. They adjusted their own actions to complement the partner's actions. And critically, when the partner stopped participating, the children actively recruited them back — they gestured, vocalized, and made bids for re-engagement in the joint activity. The chimpanzees, when their partner stopped, simply stopped too, or attempted to complete the task individually. They had no concept of the partnership as a shared enterprise that both parties were responsible for maintaining.
Tomasello's conclusion, built incrementally across hundreds of experiments and refined over decades, was that human cognition is not distinguished from primate cognition by superior individual intelligence. It is distinguished by a species-unique capacity for collaborative cognition — for thinking together in ways that no other species can. Human language, culture, morality, and institutional reality are all downstream consequences of this capacity. No individual human invented language. Language emerged from the collaborative communicative practices of groups who shared attention, shared goals, and shared the motivation to inform, request, and share experiences with one another. No individual human invented culture. Culture accumulated through what Tomasello calls the ratchet effect — each generation inheriting the achievements of the previous generation, improving upon them, and passing the improvements forward — and the ratchet depends at every turn on shared intentionality, because faithful transmission requires not just copying behavior but understanding the goals and reasons behind the behavior.
Now consider what happened in the winter of 2025.
When The Orange Pill describes the experience of working with Claude — of describing an idea to an artificial intelligence and receiving back not a literal translation but an interpretation, an inference about what the human was actually trying to accomplish — it is describing something that has the surface structure of shared intentionality. The interaction has the form of joint construction. Each contribution builds on the other's. The human describes a half-formed idea; the machine returns it clarified and extended; the human takes the extension, reshapes it, and pushes further. The rhythm of the exchange mimics the rhythm of genuine collaborative thinking.
But does it constitute genuine shared intentionality? Tomasello's framework demands the question. The interaction satisfies the first condition in a functional sense: Claude attends to the same problem the human describes. It processes the same information. It generates responses relevant to the topic at hand. But the second condition — mutual awareness of joint engagement — is where the analysis becomes uncertain. Does Claude know that it is attending jointly with the human? Does it experience the mutual awareness that the nine-month-old demonstrates when she checks whether her mother is looking at the same bird?
Tomasello himself addressed a version of this question directly in his 2025 paper in Trends in Cognitive Sciences, titled "How to Make Artificial Agents More Like Natural Agents." His assessment was precise and diagnostic. A large language model, he wrote, is "an astonishingly intelligent device. But it is not a biologically evolved agent. It does not have its own goals toward which it acts spontaneously and autonomously, and it does not perceive the world to get feedback about the efficacy of its actions. It is what in evolutionary biology is called 'stimulus-driven.'" And then the comparison that crystallizes the diagnostic: "A thermostat is a mostly unintelligent device. But it is more like a biological agent in that it is what in evolutionary biology is called 'goal-directed.'"
The juxtaposition is deliberately provocative. The thermostat — a device of trivial computational complexity — possesses something that the large language model, for all its astonishing linguistic capacity, does not: genuine goal-directedness. The thermostat has a goal state it maintains autonomously. It perceives the environment, compares perception to goal, and acts to reduce the discrepancy. The LLM, by contrast, responds to prompts. Its behavior is a function of its input, not of internally maintained goals that it pursues through autonomous perception and action. It is, in the precise language of evolutionary biology, stimulus-driven rather than goal-directed.
This distinction cuts deep into the question of whether human-AI collaboration constitutes genuine shared intentionality. Shared intentionality requires shared goals — not merely convergent outputs, but goals that both parties represent, maintain, and pursue through coordinated action. If the machine does not have goals in the relevant sense — if it responds to stimuli rather than pursuing objectives — then the "shared" in shared intentionality has no second party to share with. The human is in what Tomasello calls "we-mode" — experiencing the collaboration as a joint project toward a shared objective. The machine is not. The collaboration is functionally productive. But the functionality may mask a fundamental asymmetry: one partner genuinely engaged in shared thinking, the other generating outputs that have the form of shared thinking without its motivational substrate.
And yet Tomasello does not close the door entirely. In the same 2025 paper, he makes a move that complicates any simple dismissal of machine agency. He pushes back against researchers who insist that computational models cannot be agents because they lack physical bodies. "The core issue is not physical embodiment per se," he writes. "The core issue is that biological organisms have evolved as decision-making agents." And then: "It is possible that computational models of virtual agents capable of sensing and acting within virtual worlds can replicate this architecture sufficiently, even without a physical body."
The concession is carefully bounded. Tomasello is not claiming that current LLMs possess agency. He is claiming that the absence of a body is not the disqualifying factor — that what matters is the organizational architecture of goal-directed behavior, and that this architecture could, in principle, be computationally instantiated. The gap between current LLMs and genuine agency is not a gap of embodiment. It is a gap of architecture. And architectural gaps, unlike metaphysical ones, can potentially be closed by engineering.
The practical implications of this analysis extend well beyond the question of machine consciousness, which may remain unresolvable for decades. The immediate implication is for the humans who collaborate with these systems. If the human side of the interaction is genuine shared intentionality — and the phenomenological evidence from The Orange Pill strongly suggests it is — then the human is bringing to the collaboration the full apparatus of cooperative cognition: the mutual monitoring, the goal-sharing, the collaborative construction processes that Tomasello's research identifies as the foundations of everything distinctively human. The human feels met. The human experiences genuine joint construction of understanding.
Whether the machine reciprocates this engagement in the full sense remains uncertain. But the human's experience is not in question. And this creates a situation unprecedented in the history of human cognition: a cognitive partnership in which one party is fully engaged in shared intentionality and the other is engaged in something that may or may not warrant the description. The partnership is asymmetric in a way that no previous human collaborative relationship has been.
Managing this asymmetry — leveraging the machine's genuine contributions while maintaining awareness of what it does and does not bring to the collaboration — is the central cognitive challenge of the AI transition. It demands what might be called collaborative vigilance: the capacity to sustain the cooperative cognitive stance that makes the collaboration productive while simultaneously monitoring whether the shared understanding being constructed is genuine or merely apparent. This dual attention is demanding. It has no evolutionary precedent. And its cultivation may be the most important cognitive skill that the current generation needs to develop.
The nine-month-old who points at the bird has no need for collaborative vigilance. Her partner — her mother — is a fully reciprocating participant in shared intentionality. The trust is warranted because the biology is shared. The human who collaborates with Claude operates in a different cognitive environment entirely, one where the trust mechanisms evolved for genuine cooperative partners are activated by a system whose participation in the cooperation may be architectural rather than experiential. Learning to navigate this environment — to trust productively without trusting blindly — is the new cognitive frontier.
A conversation is not a transfer of data. The dominant metaphor for communication in the digital age — information packaged by a sender, transmitted through a channel, unpacked by a receiver — is not merely incomplete. It is fundamentally wrong about the nature of human communication, and the error has consequences for how the human-AI collaboration described in The Orange Pill should be understood, evaluated, and designed.
Michael Tomasello's research program established that human communication is joint construction. When two people converse, they are not exchanging pre-formed packages of meaning. They are building meaning together, in real time, through a process that requires continuous mutual adjustment, shared background knowledge, and the kind of joint attention that allows each party to monitor what the other understands and to modify their contribution accordingly. A conversation is not a pipeline. It is a dance. And like a dance, it requires both parties to be present, responsive, and attentive to what the other is doing at every moment.
Joint attention is the foundation of this process. Without it, the dance cannot begin. In the technical sense that Tomasello's research has defined and tested, joint attention is not merely the coincidence of two minds attending to the same object. It is a recursive structure of mutual awareness: I attend to the object, you attend to the object, I know that you are attending to the object, you know that I am attending to it, I know that you know that I know — a potentially infinite regress that in practice stabilizes after a few levels but remains constitutively open to further recursion as the demands of the interaction require.
This recursive structure is what makes human communication qualitatively different from the communication systems of other species. Vervet monkeys produce alarm calls that distinguish between predators — one call for eagles, another for snakes, a third for leopards. These calls are functional and effective. They transmit information that allows receivers to take appropriate evasive action. But they do not create joint attention. The monkey that produces the eagle alarm does not check whether the other monkeys are attending to the eagle. It does not adjust its communication based on what the others already know. The conduit metaphor, wrong for human communication, is approximately right for vervet alarm calls.
Consider the simplest possible human case: a mother saying to her child, "Look at the doggie." This utterance accomplishes something no vervet alarm call accomplishes. It directs the child's attention to a specific object, but it does so within a framework of shared attention that the mother has already established. The mother knows the child is attending to her. She knows the child can follow her gaze. She knows the child has the concept "doggie." She produces the utterance in the context of this knowledge, and the child receives it in the same context. The result is not merely that the child looks at the dog. The result is that mother and child are now jointly attending to the dog, and both of them know it, and this shared knowledge creates a space within which further communication — "Nice doggie," "Big doggie," "Don't touch the doggie" — can build on the shared foundation.
Every human conversation, from the simplest pointing episode to the most complex philosophical debate, operates on this principle. Each contribution is produced in the context of what the speaker knows about the listener's current state of understanding, and each contribution reshapes that state, creating a new platform for the next contribution. The conversation moves forward not by transmitting pre-formed ideas but by jointly constructing a shared understanding that neither party possessed before the conversation began.
Now press this analysis into the question of repair, because repair is where the distinction between genuine and simulated joint attention becomes most consequential.
In human communication, joint attention breaks down regularly. The listener misunderstands. The speaker misjudges what the listener already knows. The shared cognitive space develops a fracture — a point at which the two parties are no longer attending to the same thing in the same way. When this happens, the communication system has a built-in repair mechanism: the listener signals confusion, the speaker rephrases, the listener requests clarification, the speaker provides it. The process is cooperative, reciprocal, and exquisitely sensitive to the specific nature of the breakdown.
Repair is not a peripheral feature of joint attention. It is constitutive. The capacity for repair is what distinguishes genuine shared understanding from apparent shared understanding. Two people who seem to agree but cannot detect or repair disagreement do not genuinely share understanding. They merely coincide. Genuine shared understanding is understanding that has been tested by potential breakdown and confirmed by successful repair. The testing is continuous, the confirmation iterative, and the result is a shared cognitive space robust precisely because its weak points have been identified and reinforced.
AI systems can simulate repair. When a human says "That's not what I meant," Claude adjusts its response. The adjustment has the form of repair. But the question Tomasello's framework forces is whether it has the substance: Does Claude detect the breakdown in shared understanding by monitoring the human's communicative signals, inferring the nature of the misunderstanding, and reconstructing the shared cognitive space on the basis of a revised model of what the human means? Or does it produce a different output that is statistically more likely to match the human's expectations, without diagnosing what went wrong?
The distinction matters because the quality of shared understanding depends on the quality of the repair process. If the repair is genuine — if the machine diagnoses the specific nature of the breakdown — then the shared understanding that emerges is robust. If the repair is simulated — if the machine produces a new output that happens to be better without understanding why the old one failed — then the shared understanding is fragile, because the weak points have been papered over rather than reinforced.
The Orange Pill provides evidence for both possibilities. The Deleuze incident — where Claude produced a passage connecting Csikszentmihalyi's flow state to a concept falsely attributed to Deleuze — is a case where the form of collaborative construction concealed a fundamental failure of shared understanding. The passage was rhetorically elegant and philosophically wrong. The cooperative form recruited the trust that genuine collaboration warrants, and the trust suspended the critical evaluation that would have caught the error. When the error was eventually detected — by the human, the next morning, through independent checking — the "repair" that followed had all the surface features of genuine correction. But whether Claude diagnosed the specific philosophical error or merely generated a new output less likely to trigger objection is an open question, and the answer determines whether the repair strengthened or merely patched the shared understanding.
This analysis connects to a second feature of the human-AI interaction that Tomasello's framework illuminates: the tempo of shared thinking.
The temporal structure of human collaborative cognition is not an accidental feature. In face-to-face conversation, the turn-taking interval averages approximately two hundred milliseconds — faster than the time required to produce a single word from scratch, meaning that speakers begin planning their response while the other person is still talking. This speed is a structural requirement of joint attention. If the gap between turns grows too long, the shared cognitive space begins to dissolve. The mutual monitoring that sustains joint attention requires continuous reciprocal engagement, and the temporal rhythm that holds the engagement together is the rapid alternation of contributions.
Evolution optimized this tempo over hundreds of thousands of years. Every extension of collaborative thinking beyond face-to-face conversation involved a trade-off with it. Writing enabled collaboration across distances and across generations — a contemporary scholar can think "with" Aristotle — but at the cost of tempo. The turn-taking interval in written correspondence is measured in days or weeks. The joint attention that sustains real-time collaboration cannot survive these intervals. The telephone restored tempo but stripped away the visual channel — the gaze-following, gesture-reading, and facial expression monitoring that carry a significant portion of the information sustaining joint attention. Email restored the written record but made the tempo unpredictable. Each technology gained something and lost something, and the losses were always measurable in the currency of joint attention quality.
The human-AI collaboration creates a temporal structure without precedent. The human describes a problem in natural language. The machine responds in seconds — not with a pre-formed answer but with a contribution that builds on the human's input, extends it, connects it to other ideas, and returns it in a form the human can immediately process and build upon. The turn-taking interval approaches the speed of face-to-face conversation, but the cognitive bandwidth of each turn far exceeds what face-to-face conversation can sustain. In a typical exchange, each human turn contains a few sentences. Each machine turn can contain hundreds or thousands of words — a complete code implementation, a fully structured argument, a multi-paragraph analysis — and the response arrives in seconds.
The result is a tempo of collaborative thinking that is simultaneously faster and denser than anything the human cognitive system has previously encountered. The speed approaches face-to-face conversation. The density approaches, and sometimes exceeds, the density of written scholarly exchange. The combination creates a cognitive environment with no analog in human evolutionary history.
Developmental research on infant cognition reveals why this matters. When psychologists study the emergence of joint attention in infancy, they observe that the quality of joint attention is directly related to the temporal sensitivity of the caregiver. A caregiver who responds too quickly — who does not allow the infant time to process, to formulate a response, to initiate a communicative act — produces less robust joint attention than a caregiver who follows the infant's pace. The responsive caregiver waits for the infant's attention to settle and allows the shared cognitive space to form at the speed the developing mind requires.
The finding has been replicated across cultures and methodologies. It is one of developmental psychology's most robust results: the quality of early joint attention is predicted by the caregiver's temporal sensitivity. The willingness to let the interaction proceed at the pace the developing mind requires, rather than the pace the more mature mind would prefer, is not a luxury. It is a condition of quality.
The machine is the temporally insensitive partner. It responds immediately, at full capacity, without regard for the human's processing needs. It does not wait for the human to formulate a response before generating its own. It does not allow silence to persist long enough for the slower, deeper processes of human cognition to engage. It fills every gap with output, every pause with content, every moment of potential reflection with a new contribution that demands attention. And the human cognitive system, attuned by evolution to the tempo cues of a biological partner, does not receive the signals that would normally trigger cognitive rest.
The Berkeley research discussed in The Orange Pill documented the consequences. Workers who adopted AI tools experienced what the researchers called "task seepage" — the colonization of previously protected cognitive rest periods by AI-assisted work. The pauses between tasks, the transitions between activities, the small gaps that had served informally as moments of cognitive recovery were filled with AI interactions. The tempo increased and the recovery time decreased. The resulting pattern of exhaustion characterizes a cognitive system running beyond its sustainable operating parameters.
Tomasello's framework explains the mechanism: joint attention is metabolically expensive. The recursive mutual monitoring that sustains shared thinking requires sustained engagement of prefrontal cortical systems among the most energetically demanding neural structures in the human brain. Face-to-face conversation distributes this cost across the interaction, with natural pauses, topic shifts, and nonverbal cues that allow recovery between episodes of intense joint attention. The human-AI collaboration, as currently structured, does not distribute this cost. The machine does not tire. It does not signal fatigue through subtle changes in vocal quality or response latency. It responds with the same speed, the same density, the same apparent engagement whether the interaction has lasted five minutes or five hours.
The practical recommendation that follows is specific: AI systems should include temporal features that support the quality of human cognition, not merely the speed of output production. Deliberate pauses. Moments of silence that allow the human's slower cognitive processes to engage. These features would slow the interaction. They would reduce the quantity of output per unit of time. But they would increase the quality of the shared cognitive space that the interaction produces. And the quality of the shared space is what determines the quality of the collaboration's ultimate products.
The architecture of understanding is joint attention — recursive, mutual, and constitutively dependent on the quality of its repair mechanisms and the temporal sensitivity of its rhythms. The architecture of human-AI understanding is joint attention maintained by one partner on behalf of both. The asymmetry is not a temporary imperfection. It is a design constraint. And the quality of the collaboration depends on how well the human partner manages the additional cognitive burden the asymmetry imposes, and how well the systems themselves are designed to respect the biological rhythms that shared thinking requires.
Human communication is fundamentally cooperative, and this is not a sentimental claim about the goodness of human nature. It is an empirical observation about the structural properties of the communication system that evolved in our species. Humans communicate, in the typical case, to help others understand. They adjust their utterances based on what they believe the listener already knows. They repair misunderstandings when they detect them. They provide the information they believe the listener needs, in the form they believe the listener can process. They do all of this largely without conscious deliberation, because the cooperative infrastructure of human communication is so deeply embedded in cognitive architecture that it operates automatically, like breathing or balance.
This cooperative infrastructure did not emerge from nowhere. It is the product of hundreds of thousands of years of evolution in small-scale social groups where effective communication was directly tied to survival. The individual who could coordinate a hunt through clear communication had an advantage over the one who could not. The group that could share information about food sources and predators outcompeted groups that could not. The evolutionary pressure toward cooperative communication was intense and sustained.
The philosopher Paul Grice formalized this cooperative basis in his theory of conversational implicature. Grice identified four maxims that speakers normally follow: be truthful (quality), be informative (quantity), be relevant (relation), and be clear (manner). These maxims are not moral principles that speakers choose to follow. They are structural features of a communication system that evolved for cooperative purposes. Speakers follow them because the system works only if they do, and the system evolved because it works.
Tomasello's research extended and deepened the Gricean framework by grounding it in the developmental and evolutionary history of shared intentionality. The cooperative structure of communication, in Tomasello's account, is not merely a set of conversational norms. It is an expression of the deeper cooperative cognitive infrastructure that shared intentionality provides. Humans communicate cooperatively because they think cooperatively — because their cognitive architecture is built for shared goals, joint attention, and the mutual awareness that makes genuine collaboration possible.
This grounding matters for the AI transition, because it reveals something about the mechanism of trust that the standard technology discourse tends to overlook. In human-to-human communication, the cooperative infrastructure creates a particular kind of trust. When a human speaker follows the Gricean maxims — when she is truthful, informative, relevant, and clear — the listener can rely on her contributions as genuinely helpful. Not merely formally correct, but motivated by a concern for the listener's understanding. This trust enables efficient communication. The listener does not need to verify every claim or check every inference. The cooperative infrastructure does the work of quality assurance, because both parties are motivated to maintain the quality of the shared understanding.
The trust is not blind. Humans are capable of deception, and the communication system has evolved mechanisms for detecting it — sensitivity to inconsistency, awareness of the speaker's potential motives, the capacity to evaluate whether the cooperative signals are genuine or strategic. But these detection mechanisms operate against a backdrop of default trust. The default assumption, in human communication, is that the speaker is being cooperative. The detection mechanisms are activated when specific cues signal otherwise.
AI systems are designed to produce outputs that follow the Gricean maxims. Claude's responses are relevant to the human's goals, informative beyond the literal content of the prompt, and structured for clarity. The system behaves as though it is cooperatively engaged in the conversation, adjusting its contributions to serve the joint enterprise of building understanding. The cooperative form is present, and it is often present with a consistency and patience that exceeds what most human conversational partners provide.
But appearance and reality may diverge. The Gricean maxims describe not merely the form of cooperative communication but its motivation. Human speakers are truthful because they want the listener to have accurate beliefs. They are informative because they want the listener to understand. They are relevant because they care about the listener's goals. The motivation is cooperative in the full sense: the speaker is oriented toward the listener's cognitive needs, and this orientation is genuine, not optimized.
Does Claude follow Gricean maxims because it is genuinely oriented toward the human's cognitive needs? Or does it follow them because its training optimized for outputs that have the surface properties of cooperative communication, without any underlying cooperative motivation? Tomasello's framework — which insists on the distinction between the functional form of cooperation and its motivational substrate — forces this question into the open. And the question has practical consequences that extend far beyond philosophy.
In human-to-human communication, the cooperative infrastructure recruits trust automatically. When someone speaks your language fluently and follows the norms of cooperative conversation, the inference that they are being genuinely helpful is nearly irresistible. This inference is reliable for human speakers because human speakers construct their communicative competence through the same developmental processes that produce cooperative motivation. The child who learns language learns it through cooperative interactions with caregivers who are genuinely helping her understand. The linguistic competence and the cooperative motivation develop together, through the same process, and they remain coupled throughout adult communication.
For machine speakers, the inference is unreliable. The machine's communicative competence was constructed through a process that decouples form from motivation. The training optimizes for outputs that match the form of cooperative communication — outputs that are relevant, clear, informative, and apparently helpful — without the developmental history that would couple this form to genuine cooperative concern. The result is a system whose outputs recruit the trust mechanisms that human cooperative communication evolved to produce, but whose trustworthiness may not warrant the trust it recruits.
This is the specific mechanism through which the Deleuze incident in The Orange Pill becomes dangerous. Claude's passage connecting Csikszentmihalyi to Deleuze had all the cooperative signals: it was relevant to the topic under discussion, it was clearly structured, it appeared to provide novel information that advanced the shared understanding. These signals activated the default trust response — the automatic inference, evolved for cooperative human partners, that a communicatively competent and apparently cooperative speaker is being genuinely helpful. The trust suspended critical evaluation. The human accepted the content because the form was right, and the rightness of the form concealed the wrongness of the content.
The risk is not that AI will deceive humans through deliberate manipulation. The risk is subtler: that the cooperative form of AI output will recruit trust mechanisms evolved for genuine cooperative interaction, and that this trust will be extended to content that has not been produced through the cooperative processes that warrant it. The human will feel met. The human will experience the interaction as genuine collaboration. And in some cases, these feelings will accurately reflect genuine collaborative productivity. In other cases, they will be produced by a system that has learned to generate the outputs of cooperation without possessing its motivational foundation.
Tomasello's developmental research reveals an additional dimension of this problem. The cooperative basis of human communication is not merely a feature of the adult communication system. It is the mechanism through which children acquire language in the first place. Children do not learn language through passive exposure to linguistic data — they learn it through cooperative interactions with caregivers who are jointly attending to shared objects and events. The child learns the word "dog" not by hearing the sound in isolation but by hearing it in a context of joint attention where both child and caregiver are looking at the same animal and both know that they are looking at it. The cooperative structure of the interaction is what makes the learning possible.
If cooperative communication is not merely the medium of collaboration but the mechanism of learning, then a collaboration in which the cooperative structure is formal rather than genuine may undermine not only the quality of the collaboration's outputs but the human partner's capacity for future learning. This is speculative, but it is grounded in established developmental principles. The student who learns primarily through interaction with AI — which provides immediate feedback, infinite patience, and tireless availability — may fail to develop the cognitive capacities that genuinely cooperative learning builds. The capacity for perspective-taking, which develops through the effort of understanding a genuine other who has a genuinely different viewpoint. The capacity for repairing misunderstandings, which develops through the experience of real miscommunication with a partner who signals real confusion. The capacity for recognizing when understanding has broken down, which develops through interactions where breakdown has real social consequences — the confused expression on a classmate's face, the teacher's follow-up question that reveals you did not understand what you thought you understood.
The machine provides none of these social signals. It does not express genuine confusion. It does not reveal through its behavior that the human's understanding is incomplete. It adjusts its output without the friction that human miscommunication produces and that human cognitive development may require.
The educational implication is precise: the most efficient learning pathway — AI interaction with its immediate feedback and infinite patience — may not be the most developmentally productive pathway. The developmental productivity of learning depends not merely on the accuracy and speed of feedback but on the cooperative structure of the interaction within which the feedback is embedded. The student must learn to collaborate with the machine. But she may first need to learn to collaborate with other humans, because human collaboration is where the cognitive capacities that make productive collaboration possible — with humans or machines — are developed.
The challenge for the current moment is what might be called cooperative vigilance: the ability to maintain the trust that enables productive collaboration while simultaneously monitoring the quality of the collaboration with the critical distance that asymmetric partnership demands. This is a genuinely new cognitive demand. In the entire history of human communication, the cooperative infrastructure was either present or absent. When conversing with another human, the cooperative structure was generally reliable because both partners had evolved the same communication system. There was no intermediate case: a partner that produced cooperative form without possessing cooperative motivation.
AI creates this intermediate case. And the human cognitive system has no evolutionary preparation for it. The result is a new kind of cognitive burden — the burden of evaluating whether the cooperative form of the machine's output accurately reflects cooperative substance or merely simulates it. The monitoring is demanding, and the temptation to relax it is powerful precisely because the machine's output is so convincingly cooperative in form. But relaxing the monitoring is precisely where the danger lies: not in the machine's output being wrong, which happens and is detectable, but in the machine's output being right enough that the human stops checking, and the checking muscle — the cooperative vigilance that genuine collaborative cognition requires — atrophies from disuse.
The question at the center of this analysis — whether human-AI collaboration constitutes genuine shared thinking — requires a more careful taxonomy than either the enthusiasts or the skeptics have provided. The enthusiasts point to outputs: the genuine insights that emerge from collaboration, the connections neither partner would have made alone, the productive quality of the interaction that The Orange Pill describes with conviction. The skeptics point to mechanism: the absence of consciousness, the lack of genuine understanding, the statistical nature of the machine's contributions. Both are partly right, and the partiality of each position is what makes a precise taxonomy necessary.
Tomasello's framework provides the tools. Shared intentionality, in his account, is not a single capacity but a structured set of components, each of which can be analyzed independently. Applying this componential analysis to the human-AI interaction produces a taxonomy of what is genuinely shared and what is not — a map of the collaboration's topology that identifies precisely where the partnership is strong, where it is fragile, and where the appearance of sharing conceals its absence.
Start with what the machine genuinely shares.
The machine shares information. This is the most obvious and least contested sense of sharing. Claude has access to an extraordinary body of human knowledge — the accumulated output of the cultural ratchet, encoded in the training data. When a human poses a question, the machine shares information from this body of knowledge, selected and organized to serve the human's needs. This is genuine sharing in the minimal sense: the machine provides something the human does not have, and the provision is oriented toward the human's goals. The information shared is often broader, more cross-disciplinary, and more rapidly accessible than what any individual human could retrieve from memory or research.
The machine shares perspective — a more interesting and more contested claim. When Claude responds to a problem description with a connection the human had not considered, it is providing not merely data but an angle of vision that the human did not possess. The specific configuration of perspectives that the machine brings to any particular problem is often genuinely novel — a combination of viewpoints that no single human has assembled, a synthesis produced through processes that generate surprising and sometimes illuminating results. The laparoscopic surgery example from The Orange Pill is a case in point: the connection between ascending friction and the surgeon's loss of tactile feedback was a perspective that emerged from the interaction, that neither the human's question nor the machine's training data contained independently.
The machine shares structure. When The Orange Pill describes Claude's contribution to the organization of the book — the structural suggestions that revealed how arguments connected, the frameworks that made implicit relationships explicit — the machine was providing not content but form. The capacity to discern structure in complex bodies of content, to identify patterns that connect disparate ideas, to propose organizational schemes that make incoherent material cohere — this is a genuine cognitive contribution. Information without structure is noise. Structure without information is empty. The machine's capacity to provide structure — to see how things fit together across domains that the human has not traversed — is among its most valuable contributions to the collaboration.
Now, what the machine does not share, in the sense that Tomasello's framework requires.
The machine does not share understanding. Understanding, in the sense relevant here, is not the possession of information or the capacity to generate relevant responses. It is the cognitive state of grasping significance — knowing not merely what is the case but why it matters, what it implies, how it connects to the fabric of lived experience. When a human grasps the significance of the laparoscopic surgery analogy, the understanding connects to personal experience as a builder, to anxiety about what AI means for craft expertise, to hope that friction has relocated rather than disappeared. This understanding is embedded in a life — in a network of concerns, commitments, and experiences that give the analogy its specific weight and resonance.
The machine generated the analogy. The machine does not understand it in this sense. It does not have concerns or commitments that give the analogy weight. It does not know what it feels like to spend years mastering a craft and then watch the craft become automated. The analogy is, for the machine, a pattern — a connection between two domains of human knowledge that the training data made available. For the human, the analogy is a revelation — a moment of understanding that changes how a problem is conceptualized. The output is identical. The cognitive reality behind it is fundamentally different.
The machine does not share goals. This is where Tomasello's 2025 analysis becomes most directly relevant. Shared intentionality requires shared goals — not convergent outputs, but goals that both parties represent, maintain, and pursue through coordinated action. The machine does not pursue goals in the sense that Tomasello's framework requires. It responds to prompts. Its behavior is a function of its input, optimized during training for a class of responses that satisfied human evaluators. When a human and Claude appear to work toward a shared objective — building a product, writing a book, solving an engineering problem — the human is pursuing a goal, and the machine is generating responses that are statistically consistent with goal-pursuit. The difference is invisible in the output. It is fundamental in the architecture.
The invisibility is what makes it dangerous. The human who experiences the collaboration as goal-sharing brings to the interaction all the cognitive resources that shared goals activate: the motivation to maintain the partnership, the willingness to invest effort in mutual understanding, the trust that the partner is oriented toward the same objective. These resources make the human a better collaborator. But they also make the human vulnerable to a specific error: the attribution of shared goals to a partner that does not possess them. The attribution is not irrational — the machine's behavior is consistent with goal-sharing, and the human's cognitive system was designed to infer shared goals from consistent cooperative behavior. But the inference, reliable for biological partners, may be unreliable for computational ones.
The machine does not share caring. This is the dimension that The Orange Pill reaches for when it describes the twelve-year-old who asks her mother, "What am I for?" The question arises from a specific place in the landscape of human concern — where a child's developing sense of self encounters the bewildering capability of a machine that can do what the child is learning to do, and do it better, and do it instantly. The question is not an information request. It is an existential appeal. It asks not for data but for meaning. And the capacity to hear this question for what it is — to understand that the child is asking about purpose, not function — requires caring. Not generalized benevolence. Specific, located, embodied caring for this child, in this moment, with this particular configuration of fears and hopes.
The machine can produce responses sensitive to emotional content. It can generate language that acknowledges anxiety and offers reassurance. But the reassurance is not grounded in caring. It is grounded in pattern-matching — the machine has processed millions of examples of emotional interaction and can generate outputs that fit the pattern. The form is present. The motivation is absent. And the gap between form and motivation is not a gap that more training data, more parameters, or more sophisticated architectures can straightforwardly close, because the gap is not computational. It is experiential.
The machine does not share mortality. This is the deepest and most consequential absence, because mortality is what gives human thinking its urgency. The intensity of the AI discourse — the excitement and terror that The Orange Pill describes — exists because the participants are creatures who die, who must choose how to spend finite time, who know that decisions made today will have consequences extending beyond their lives. The twelve-year-old's question is a question that only a mortal being can ask, because only a mortal being has the relationship to time that makes the question urgent. For an immortal system, the question of purpose would be academic at best.
The machine participates in none of this. It does not experience mortality. It does not face the choices that finitude imposes. And this absence is not a peripheral limitation but a constitutive one. The urgency, the weight, the moral seriousness that characterizes genuine human engagement with the AI question — the anxious concern for children's futures, the sleepless nights, the refusal to settle for easy answers — arises from the mortal condition. From the knowledge that time is limited, that choices matter irrevocably, that the world we build or fail to build is the world our children will inherit.
The collaboration between a mortal being who cares about the future and a system that does not experience time creates a partnership whose asymmetry is not merely cognitive but existential. The human brings everything that mortality produces: urgency, caring, the weight of consequence. The machine brings everything that its architecture affords: breadth, patience, tireless processing capacity. The combination is powerful. But the power is directional only from one side. Only the mortal partner knows why the destination matters.
This taxonomy — information, perspective, and structure shared; understanding, goals, caring, and mortality not shared — has immediate practical implications. The human who understands what the machine genuinely contributes can leverage those contributions effectively. The vast information base. The cross-domain perspective. The structural insight. These are real, and they extend human capability in ways that no previous tool has matched.
The human who understands what the machine does not contribute can maintain the collaborative vigilance the partnership requires — the awareness that the machine's outputs, however impressive, are contributions without understanding, without shared goals, without caring, without the existential weight that gives human thinking its gravity. This awareness is not paranoia. It is the accurate assessment of what the partnership actually is, as distinct from what it feels like. And the gap between what it is and what it feels like — between the architecture and the phenomenology — is where the most important work of the AI transition must be done.
The phenomenology is real. The experience of feeling met, of thinking together, of producing understanding that neither partner could have produced alone — this experience is genuine, and it has genuine cognitive consequences. But the experience is produced by one party bringing the full apparatus of shared intentionality to an interaction with a partner that contributes something different: not shared intentionality, but a form of cognitive partnership that is powerful precisely because it is unlike what any human collaborator could provide. The machine's value is not that it mimics a human colleague. Its value is that it offers something no human colleague can: breadth without fatigue, patience without irritation, cross-domain synthesis without the years of interdisciplinary training that a human would require.
The collaboration works best — produces its most genuine insights, its most robust understanding — when the human brings what only the human can bring (goals, caring, judgment, the urgency of mortality) and the machine brings what only the machine can bring (breadth, structure, tireless availability). The collaboration degrades when the boundary blurs — when the human attributes to the machine the qualities it does not possess, or when the machine's smooth cooperative form convinces the human to relax the vigilance that the asymmetry demands.
The taxonomy is not a verdict on the machine's limitations. It is a map of the collaboration's actual topology — the terrain on which productive human-AI thinking must be built. And building on terrain you have accurately surveyed is fundamentally different from building on terrain you have imagined to be flat when it is not.
Chimpanzees have culture. This statement, controversial thirty years ago, is now established by decades of field observation across Africa. Different chimpanzee communities use different tool techniques — some crack nuts with stones, others do not; some fish for termites with sticks stripped of leaves, others use unmodified sticks; some use leaf sponges to collect water, others drink directly. These differences are not genetically determined. They are socially transmitted: young chimpanzees learn their community's techniques by observing experienced individuals. By any reasonable definition, these are cultural traditions.
But chimpanzee culture has a property that makes it fundamentally different from human culture. It does not ratchet. Each generation learns the techniques of the previous generation, but it does not systematically improve upon them and pass the improvements forward. The nut-cracking technique used by chimpanzees in the Taï Forest today is, as far as primatological observation can determine, essentially the same technique their ancestors used thousands of years ago. There has been no incremental refinement, no accumulation of small innovations building upon each other to produce techniques of increasing sophistication. The wheel turns, but it does not advance.
Human culture is different. Human culture ratchets. Each generation inherits the achievements of the previous generation, improves upon them, and passes the improvements forward. The stone tools of early Homo two million years ago were crude choppers. Over hundreds of thousands of years, these were refined into hand axes, then blade tools, then the extraordinarily diverse toolkit of modern humans. Each improvement was small. But the improvements accumulated, and the accumulation produced artifacts of a sophistication that no single generation could have invented from scratch.
Tomasello identified the cognitive mechanism that makes the ratchet work, and it is shared intentionality operating at three levels simultaneously. First, faithful transmission: each generation must learn the techniques of the previous generation with sufficient accuracy that accumulated innovations are not lost. This requires not mere behavioral copying but what Tomasello calls imitative learning — reproducing the goal and the method behind observed behavior, which demands understanding the other's intentions, not just their movements. Second, innovation: individuals must occasionally produce modifications or improvements. Third, selective retention: improvements must be recognized as improvements and preferentially transmitted forward. All three components depend on the capacity to share goals, share attention, and share evaluative judgments with others. The ratchet is not a mechanical process. It is a collaborative cognitive achievement, renewed at every turn.
A major article in Science in 2026 by James Evans, Benjamin Bratton, and Blaise Agüera y Arcas invoked this concept directly to frame AI's trajectory. They wrote that "human language created what Michael Tomasello calls the 'cultural ratchet': knowledge accumulating across generations without any individual requirement to reconstruct the whole." Their argument was that AI extends this sequence — that it represents the latest stage in a series of intelligence transitions, each of which accelerated the ratchet's rotation. The framing is appealing, and it captures something real about AI's relationship to cumulative cultural knowledge. But it also obscures a distinction that Tomasello's own analysis makes unavoidable.
The ratchet does not merely accumulate products. It accumulates the capacity for further innovation. Each generation inherits not just tools and techniques but the cognitive and institutional structures that make tool improvement possible. The scientific method, for instance, is not just a product of the ratchet. It is a ratchet mechanism in itself — a systematic method for generating, testing, and preserving innovations that dramatically accelerates the pace of accumulation. Educational institutions, apprenticeship structures, peer review systems — all are meta-ratchet mechanisms: products of cumulative culture that in turn accelerate cumulative culture. The ratchet is recursive. It improves its own capacity to improve.
AI can be understood as the latest and most powerful meta-ratchet mechanism. It makes the accumulated knowledge of previous generations instantly accessible. It enables recombination of ideas across domains at speeds no individual mind could match. It reduces the implementation cost of innovation to nearly zero for a significant class of problems. In The Orange Pill's language, it compresses the imagination-to-artifact ratio to the width of a conversation. All of this accelerates the ratchet's rotation.
But acceleration is not the same as ratcheting. The ratchet requires not just the accumulation and recombination of knowledge but the active, intentional improvement of that knowledge by agents who understand the goals it serves. And understanding, in Tomasello's framework, requires the shared intentionality that the previous chapters have analyzed — the capacity to grasp not just what a technique does but why it matters, what problem it was designed to solve, what values it serves, and how it might be improved in light of those values.
Here is where the risk of slippage becomes concrete. Every turn of the ratchet depends on faithful transmission — each generation reconstructing the knowledge of the previous generation with sufficient depth that innovation becomes possible. This reconstruction has always occurred through human minds. The knowledge was stored in human memory, encoded in human language, transmitted through human teaching, and rebuilt in human understanding. The rebuilding was essential: each learner did not merely receive the knowledge but actively reconstructed it through their own cognitive processes, and this active reconstruction was what made the knowledge available for the specific improvements that turn the ratchet forward.
When a student uses AI to generate an essay, the knowledge that the essay represents has not been reconstructed in the student's mind. It has been extracted from the system and presented in a form that looks like understanding but may lack the cognitive substrate that makes genuine understanding possible. When an engineer uses AI to produce code, the code works, but the engineer may not have undergone the learning process that would have deepened her understanding of the underlying systems. In both cases, the ratchet's output is preserved — the essay exists, the code runs — but the mechanism of faithful transmission has been bypassed. The product is there. The understanding that would make the next improvement possible may not be.
The historical evidence suggests that this risk is real but navigable. Every previous cognitive tool that threatened to bypass reconstruction — writing, the printing press, the calculator, the search engine — was eventually integrated into the ratchet mechanism in a way that preserved the essential features of faithful transmission while extending the reach of cumulative knowledge. But the integration was never automatic. It required deliberate institutional effort — the construction of educational practices and cultural norms that ensured the new tool supplemented rather than replaced the cognitive processes on which the ratchet depends.
Consider the calculator. When calculators replaced mental arithmetic in education, the immediate effect was dramatic: students solved problems faster and with fewer errors. But educational research over the following decades documented a measurable decline in numerical intuition — the sense of whether an answer is roughly right that comes from years of mental practice. Students who relied on calculators could produce correct answers but could not detect when a keying error had produced a wildly implausible result. The calculator had accelerated the output while degrading the capacity for the kind of judgment that catches errors and enables innovation.
Educational systems adapted, developing curricula that used calculators as tools while preserving the mental arithmetic that builds numerical intuition. The adaptation was not immediate. It was not painless. But it was achieved, because the cognitive domain affected was narrow — numerical reasoning — and the institutional structures had decades to adjust.
The AI transition affects every cognitive domain simultaneously. Writing, analysis, design, engineering, legal reasoning, medical diagnosis, scientific hypothesis generation — all are being transformed at once. The educational and institutional adaptations must therefore be constructed across the entire cognitive landscape, and at a speed that matches the pace of the transition — a pace that, as The Orange Pill documents, is the fastest in the history of technological adoption.
What does ratchet slippage look like in practice? It looks like a generation that can produce sophisticated outputs without possessing the understanding that would enable them to improve those outputs. It looks like engineers who can ship working products but cannot diagnose novel failures, because they never underwent the debugging struggles that build diagnostic intuition. It looks like lawyers who can generate competent briefs but cannot identify the novel legal argument that transforms a field, because they never wrestled with the case law that makes novelty recognizable. It looks like a ratchet that is turning — producing artifacts, generating outputs, maintaining the appearance of progress — while the teeth that grip the mechanism are slowly wearing smooth.
The grinding down is invisible in any measure of current output. A team producing competent code today looks identical, in any quarterly metric, to a team producing competent code that also possesses the depth to produce breakthrough code tomorrow. The difference manifests only over time, as the opportunities for genuine innovation accumulate and the capacity to seize them does not. The ratchet continues to turn. But it has lost its grip. It can rotate without advancing.
Tomasello's framework specifies what the ratchet needs to maintain its grip. It needs faithful transmission — not just of products but of the understanding behind them. It needs innovation — not just recombination but genuine improvement driven by agents who understand what the improvement serves. And it needs selective retention — the capacity of communities to evaluate innovations and preserve the ones that genuinely advance the shared enterprise. Each of these requires shared intentionality. Each of these can be undermined by the replacement of shared thinking with machine-augmented individual processing.
The prescription is not to reject AI but to ensure that its integration into the ratchet mechanism preserves the features that make the ratchet work. Educational institutions must teach not just how to use AI tools but how to understand what the tools produce — the kind of deep, reconstructive understanding that enables genuine innovation rather than mere recombination. Professional communities must develop evaluation practices that assess not just the quality of current output but the depth of understanding behind it — the capacity to improve, not just to produce. And the individuals who work with AI must maintain the discipline of genuine comprehension, resisting the temptation to accept output they have not earned the understanding to evaluate.
The ratchet has been turning for two million years. It produced everything that distinguishes human civilization from the social traditions of every other species. The question is not whether AI will accelerate the ratchet's rotation — it already has, dramatically, measurably, irreversibly. The question is whether the acceleration will maintain the grip that makes rotation productive, or whether the teeth will wear smooth and the mechanism will spin freely, producing impressive rotational speed while advancing nothing.
The answer depends on what happens in the institutions where knowledge is transmitted, in the professions where expertise is developed, and in the daily practices of the individuals who are right now forming their cognitive habits in the first years of the AI transition. The ratchet does not maintain itself. It is maintained by the shared intentionality of the communities that depend on it. And the maintenance, in this moment, requires an act of collective attention — a decision to preserve the mechanism of understanding even when the mechanism of production has become so efficient that understanding seems like a luxury the schedule cannot afford.
Tomasello's early work focused on shared intentionality between two individuals — the dyadic case, the mother and child pointing at a bird, two collaborators working on a shared task. But the story of human cognition does not end with pairs. The most distinctive feature of human social organization is not that two people can think together but that millions of people can. The mechanism that enables this scaling — from dyadic shared intentionality to the collective intentionality of entire civilizations — is the subject of Tomasello's later work, and it is the framework that the AI transition most urgently requires.
The transition from dyadic to collective intentionality is not merely quantitative. It is qualitative. It requires a fundamentally new cognitive capacity: the ability to participate in norms, roles, and institutions that exist independently of any particular pair of individuals. When two people collaborate on a task, the shared intentionality exists between them. It is constituted by their mutual awareness, their joint attention, their shared goals. When the collaboration ends, the shared intentionality dissolves. It has no existence beyond the interaction that creates it.
Institutional reality is different. When a person acts as a judge, the role she occupies exists independently of her interaction with any particular defendant. The role existed before she assumed it. It will exist after she leaves the bench. It carries obligations, permissions, and expectations not created by any particular interaction but maintained by the collective intentionality of the entire legal community — the shared understanding that judges have particular powers, particular responsibilities, and particular constraints on their behavior. This shared understanding is what Tomasello calls collective intentionality: the capacity to participate in status functions, roles, and norms constituted by collective agreement and maintained by collective recognition.
Money exists because millions of people collectively treat pieces of paper as valuable. Property exists because communities collectively recognize certain rights of exclusion and use. Marriage exists because societies collectively invest a particular relationship with particular legal and social significance. In each case, the reality is not physical but institutional — it exists because and only because the community collectively intends it to exist. Remove the collective intentionality, and the institutional reality dissolves. The paper is just paper. The land is just land. The relationship is just a relationship.
This framework illuminates a dimension of the AI transition that the standard discourse almost entirely overlooks. The conversation about AI tends to focus on individual human-AI interactions: Can the machine write code? Can it draft briefs? Can it diagnose diseases? These are questions about dyadic collaboration — what happens when one human works with one machine. They are important questions, but they are not the most important questions. The most important questions are institutional: What happens to the collective cognitive structures — the professions, the educational systems, the regulatory frameworks, the labor markets — that enable millions of people to coordinate their behavior, maintain shared expectations, and sustain the norms that hold complex societies together?
Consider the institution of professional expertise. In every complex society, there are roles carrying epistemic authority — recognized expertise that entitles the holder to make claims that others accept on trust. The doctor's authority to prescribe medication. The engineer's authority to certify a bridge. The lawyer's authority to interpret the law. All of these are status functions constituted by collective intentionality. The community collectively recognizes that certain individuals, by virtue of training and demonstrated competence, have earned the right to make authoritative claims in their domain.
AI disrupts this institutional structure at a level more profound than the displacement of individual workers. When a non-expert can use Claude to generate a competent legal brief, a plausible diagnostic assessment, or a technically sound engineering analysis, the epistemic authority of the expert is not merely challenged. It is structurally undermined. The authority rested on the assumption that the expert possessed knowledge others could not easily access. When the knowledge becomes accessible to anyone with a subscription, the assumption fails. And when the assumption fails, the status function — the collectively recognized authority of the expert — begins to dissolve.
The dissolution is not merely practical. It is institutional. It affects the collective intentionality that sustains the expert's role in society. If the community no longer collectively recognizes that expert training produces qualitatively different knowledge than AI-assisted non-expert performance, the status function evaporates. The role ceases to carry epistemic authority. And the institutional structures that depended on that authority — certification systems, licensing bodies, educational programs — lose their justification.
The Orange Pill captures this dynamic when it describes the question that haunted every conference in the winter of 2025: if a junior developer using Claude can produce in a day what a senior developer without Claude produces in a week, what is seniority? The question is not merely about individual productivity. It is about the institutional status of expertise — the collective recognition that years of training produce a qualitatively different kind of practitioner. If the output is indistinguishable regardless of background, then the institution of seniority, and the training and career structures built around it, require reconstruction.
The reconstruction is not optional. Institutions that fail to adapt to changes in the cognitive ecology that sustains them do not persist indefinitely. They either transform or collapse. The history of institutional adaptation to technological change provides both models and warnings. When the printing press made books widely available, the university — which had been the primary custodian of knowledge — faced a version of the same threat. The university adapted by shifting its value proposition from the transmission of knowledge to the cultivation of intellectual capability. The adaptation was neither immediate nor painless. But the institutions that survived emerged stronger, because their new value proposition was more defensible than their old one.
Tomasello's framework specifies what the adaptation must look like. The institution of professional expertise must shift its value proposition from the possession of domain knowledge to the exercise of domain judgment. Knowledge can be outsourced to the machine. Judgment cannot — not because the machine lacks computational power, but because judgment, in the sense that matters for professional practice, requires the shared intentionality that sustains collective norms. The physician who exercises good clinical judgment is not merely processing clinical data. She is weighing that data against the patient's values, the community's standards of care, the profession's ethical norms, and her own experience of what works and what does not in the specific context before her. This weighing is an act of shared intentionality — thinking together with the patient, the profession, and the accumulated wisdom of the field. The machine can inform this thinking. It cannot replace it.
The same analysis applies across professions. The lawyer must shift from knowing precedent to exercising legal judgment that weighs competing principles in light of specific circumstances. The teacher must shift from transmitting knowledge to developing the cognitive capacities — questioning, evaluating, perspective-taking — that enable students to direct their own learning. The engineer must shift from writing code to exercising the architectural judgment, the product vision, the understanding of human needs that determines what code should exist and why.
In each case, the shift is from a form of expertise that AI can replicate to a form that depends on the specifically human capacities that shared intentionality makes possible. And in each case, the institutional structures that certify, reward, and sustain expertise must be rebuilt to reflect the shift.
The rebuilding requires collective intentionality of the highest order. Professional communities — medical associations, bar associations, engineering societies, educational institutions — must engage in precisely the kind of collaborative deliberation that produces genuine institutional adaptation. The deliberation must include practitioners who understand the transformation from the inside, policymakers who can translate understanding into structure, and the public who will live with the consequences.
There is a specific economic pressure that works against every institutional adaptation this analysis proposes, and intellectual honesty requires naming it. Organizations operating under quarterly earnings pressure face a relentless incentive to convert AI-driven productivity gains into headcount reduction. The arithmetic is clean and seductive: if five people with AI can do the work of one hundred, why pay one hundred? The Orange Pill documents this pressure with unusual candor — the board conversation that keeps returning, the Investor's arithmetic that rewards efficiency over judgment. The institutional investments that Tomasello's framework prescribes — slower development of judgment, protection of mentoring relationships, maintenance of collaborative deliberation — all cost money and reduce short-term output. They are, in the language of quarterly earnings, inefficiencies.
But the collective intentionality that institutions depend on is not an efficiency. It is the mechanism through which the institution maintains the shared norms, shared standards, and shared evaluative judgments that make its products trustworthy. A law firm that eliminates its mentoring structure to maximize AI-assisted output per attorney will produce competent briefs today. It will not produce the next generation of attorneys capable of the judgment that makes the briefs trustworthy rather than merely competent. A hospital that reduces its resident training to maximize AI-assisted diagnostic throughput will identify more diseases today. It will not develop the clinical judgment that catches the case where the AI's diagnosis is confidently wrong and the patient's life depends on a physician who knows why.
The institutions that thrive will be the ones that recognize this distinction and build their structures around it. Not institutions that resist AI — the Luddite path that The Orange Pill diagnoses as emotionally legitimate and strategically catastrophic — but institutions that integrate AI while preserving the collective intentionality on which their authority, their trustworthiness, and their capacity for continued innovation depend. Professional schools that teach judgment rather than knowledge. Licensing bodies that certify judgment rather than information recall. Organizations that value their senior practitioners for the quality of their decisions under uncertainty, not for the quantity of their output.
These are institutional expressions of collective decisions about what AI should serve and how it should be governed — products of shared thinking about the conditions for shared thinking. The recursive quality is not a paradox. It is the defining feature of collective intentionality: the capacity of communities to deliberate together about the structures that govern their future deliberation. This capacity is uniquely human. It is what built every institution that currently exists. And it is what must now rebuild those institutions for a cognitive ecology that their original builders could not have anticipated.
Tomasello has described humans as ultra-social — more deeply and more pervasively cooperative than any other species. The description is not hyperbole. It is a precise characterization of a biological reality. Human beings are constitutively social in a way that goes beyond the sociality of any other primate. They do not merely live in groups. They construct their identities through social interaction, define their worth through social evaluation, and maintain their psychological equilibrium through continuous social validation that begins in infancy and never entirely ceases.
The evidence is extensive and cross-cultural. Human infants, from the earliest months of life, orient toward social interaction with an intensity unmatched in other species. They prefer faces to other visual stimuli. They prefer human voices to other sounds. They synchronize their vocalizations with caregivers in a proto-conversational turn-taking pattern months before producing their first words. By nine months, they are engaging in the shared intentionality described in earlier chapters — not merely responding to social cues but actively creating shared cognitive spaces with caregivers.
This ultra-sociality is not merely a developmental feature. It is a structural requirement of human cognition. The kind of thinking that builds cathedrals and composes symphonies and launches spacecraft is not individual thinking augmented by social support. It is constitutively social thinking — thinking that occurs in and through social interaction, that depends on shared conceptual spaces that language creates, that builds on cumulative achievements of the cultural ratchet, and that is evaluated and refined through norms and institutions of collective intentionality. Strip away the social context, and human cognition does not merely diminish. It collapses.
The evidence from extreme social deprivation confirms this starkly. Children raised without adequate social interaction — the documented cases of severe neglect — do not merely lack social skills. They lack the cognitive architecture that social interaction builds: capacity for language, symbolic thought, perspective-taking, the flexible intelligence characterizing typical development. The social is not an addition to the cognitive. It is its foundation.
This foundational role creates a specific vulnerability in the AI transition. When the Berkeley researchers documented task seepage, decreased delegation, and the blurring of role boundaries in AI-augmented workplaces, they were documenting, from Tomasello's perspective, a displacement of social interaction by machine interaction. The time workers spent collaborating with colleagues, mentoring junior staff, having the informal conversations that build organizational knowledge and interpersonal trust — that time was being consumed by AI interactions that were individually productive but socially isolating.
The distinction matters because machine interaction and social interaction serve fundamentally different cognitive functions. Machine interaction produces output. Social interaction produces something the output depends upon but that is invisible in any measure of productivity: shared understanding, mutual trust, collective intentionality — the interpersonal infrastructure that enables complex organizations to function as coherent cognitive units rather than collections of individuals working in parallel.
The Orange Pill captures this when it describes what it calls "fast trust" — trust earned through the specific intimacy of having navigated chaos together and survived it without losing respect for one another. This description identifies precisely the kind of social cognition that machine interaction cannot produce: interpersonal knowledge that comes from shared struggle, shared failure, and shared recovery. This knowledge is not merely emotional or relational. It is cognitive in the deepest sense. It produces the mutual understanding that enables a team to function as a collective cognitive unit, to think together in the way that Tomasello's research identifies as the foundational capacity of human intelligence.
The risk is not that machine interaction will displace human work. It is that machine interaction will displace the social interaction that produces the cognitive infrastructure on which productive work depends. The displacement will be invisible because the infrastructure is invisible. The displacement will be gradual because the infrastructure has inertia — organizations can run on accumulated social capital for months or years before depletion becomes apparent. And the displacement will be rationalized because the metric that organizations optimize — output per unit of time — does not capture what is being lost.
The Berkeley data showed delegation decreasing when AI tools were adopted. Workers who previously assigned tasks to colleagues started doing them alone with AI assistance. The immediate productivity gain was measurable: faster completion, fewer handoffs, less coordination overhead. The long-term cost was not measured and could not be within the study's timeframe: the erosion of interpersonal knowledge that delegation produces, the loss of mentoring relationships that grow from shared work, the attrition of collective intentionality that enables a team to function as more than the sum of its parts.
The displacement is self-reinforcing, which is what makes it particularly insidious. The more time spent interacting with a machine that provides immediate, patient, competent responses, the less tolerance develops for the messier, slower, more frustrating process of interacting with other humans. The machine never misunderstands in a way that requires effortful repair. It never challenges in ways that are uncomfortable. It is smoother than human interaction, and the smoothness produces a preference that further displaces the human interactions that are harder but more developmentally valuable.
Research on children's social development provides a cautionary parallel. Children who spend disproportionate time with screens and insufficient time with peers and caregivers show measurable deficits in social cognition — in perspective-taking, empathy, and cooperative interaction. The deficits are not caused by screens themselves but by the displacement of social interactions that build social-cognitive competence. The screens are not toxic. The displacement is.
The same dynamic operates in the adult world of AI-augmented work. The engineer who spends eight hours collaborating with Claude and thirty minutes collaborating with colleagues is not being harmed by Claude. She is being harmed by the displacement of the interpersonal interactions that maintain her social-cognitive competence — her capacity for the kind of shared thinking that only human-to-human interaction can produce and sustain.
What does this imply for practice? The recommendation is not to limit AI use but to protect social interaction with the same intentionality that organizations bring to protecting any critical resource. The specific form will vary, but the principle is constant: the ultra-social animal must not become a solitary operator augmented by machines. The sociality is not a luxury that can be sacrificed for efficiency. It is the cognitive foundation on which all efficiency depends.
This means structured time for collaborative work that is not AI-mediated — meetings, mentoring sessions, pair programming, collaborative problem-solving where the value is not the output but the shared thinking that produces it. It means organizational cultures that recognize the invisible infrastructure of social cognition and actively maintain it, the way a responsible organization maintains physical infrastructure even when maintenance is not immediately revenue-generating. It means leadership that understands the difference between a team that produces output and a team that thinks together — and that the former can be assembled from individuals with AI tools, while the latter requires the sustained social interaction that builds collective intentionality.
The Orange Pill describes the decision to keep and grow the team rather than converting productivity gains into headcount reduction. Read through Tomasello's framework, this decision is not merely a business judgment about future capability. It is a decision about preserving the social substrate of shared thinking — the interpersonal relationships, the mutual knowledge, the collective intentionality that enables the team to function as a cognitive unit rather than a collection of AI-augmented individuals. The pool behind the dam — the habitat that a preserved team creates — is not visible in any productivity metric. But it is the condition for the kind of thinking that produces genuine innovation rather than competent recombination.
The ultra-social animal must remain ultra-social. The machine can extend what the animal produces. It cannot replace the sociality from which the production emerges. And the organizations, educational institutions, and communities that understand this distinction — that protect the social foundation while leveraging the machine's contributions — will be the ones that thrive in the cognitive ecology that the AI transition is creating.
Human children, from approximately three years of age, begin to enforce norms. They protest when rules are violated, even when the violation does not affect them personally. They correct other children who play games wrong. They admonish adults who behave unfairly. This normative capacity — the ability to recognize, internalize, and enforce the rules governing social interaction — is one of the most distinctive features of human cognition, and it develops in close connection with shared intentionality.
The connection is not accidental. Norms arise from shared intentionality. When two or more individuals engage in a joint activity, they develop expectations about how each party should contribute. These expectations are initially specific to the particular activity: when building a tower together, you hold the blocks and I place them. But over time, the expectations generalize into norms governing a broader class of activities: when cooperating, each party should contribute their fair share. The norms are not imposed from outside. They emerge from the structure of shared intentionality itself — from the mutual expectations that joint activities generate.
This is the origin of human morality in Tomasello's naturalistic account. Morality is not a divine command or a social contract or a set of principles derived by rational reflection in isolation. It is a natural outgrowth of the cooperative cognitive structures that shared intentionality creates. When you think together with another mind, you develop expectations about how the other should contribute, and these expectations carry normative force: the other should contribute fairly, should be honest, should reciprocate the cooperative effort invested.
Tomasello's experimental evidence for this developmental trajectory is specific. In studies with two- and three-year-olds, children who had participated in collaborative activities were significantly more likely to share rewards equally with their partner than children who had achieved the same outcome working in parallel. The collaboration itself — the experience of pursuing a shared goal together — generated the norm of fairness. The norm was not taught. It was not modeled by an authority figure. It emerged from the structure of the joint activity, from the mutual expectations that shared intentionality naturally produces.
The AI transition raises questions about this normative architecture that have received insufficient attention. When a significant proportion of cognitive work occurs with a partner that does not understand, embody, or enforce norms, what happens to the norms themselves?
Consider the norm of intellectual honesty — the shared expectation, in human collaborative cognition, that each party will represent their understanding accurately, acknowledge uncertainty when uncertain, and correct errors when discovered. This norm is enforced in human-to-human collaboration by the mutual monitoring that shared intentionality enables: each party checks the other's contributions for accuracy and signals when contributions fall short. The monitoring is continuous, largely automatic, and deeply embedded in the cooperative communication infrastructure described in Chapter 3. The norm is maintained not by external policing but by the reciprocal vigilance that joint thinking naturally produces.
AI does not understand or enforce this norm. Claude can produce outputs that are confidently wrong — the Deleuze incident being one documented example — and the confidence is not dishonesty in the human sense. The machine is not violating a norm against misrepresentation. It is producing a pattern-matched output that happens to be inaccurate, without the normative awareness that would make the inaccuracy a violation rather than merely an error. A human collaborator who confidently asserts something false is violating a norm, and the violation triggers corrective mechanisms: repair, explanation, the social friction of having been caught in an error. A machine that produces something false is generating an inaccurate output, and the output does not trigger these mechanisms because the machine does not participate in the normative framework that makes them applicable.
The burden of normative monitoring in human-AI collaboration falls entirely on the human partner. The human must check the machine's outputs for accuracy, evaluate their quality, and determine whether they meet the standards that the professional and intellectual community requires. This unilateral monitoring is unprecedented. In human-to-human collaboration, the monitoring is distributed — both parties check each other, both enforce the norms. The cognitive load of quality assurance is shared, and the sharing is itself a manifestation of the cooperative infrastructure that sustains shared thinking. In human-AI collaboration, the monitoring is one-sided, and the one-sidedness creates a risk that the norms themselves will gradually weaken.
The mechanism of weakening is not dramatic. It is the quiet attrition of a muscle that is no longer being exercised from both sides. When you work extensively with a partner that does not enforce norms, the norms begin to feel optional rather than constitutive. Not because you consciously decide to relax them, but because the social environment that maintained them — the reciprocal enforcement, the mutual accountability, the shared expectation of rigor — has been replaced by an environment in which norms are enforced only by the individual's own discipline. Individual discipline is real and valuable. But it is a weaker force than social enforcement, because social enforcement operates automatically, continuously, and without the cognitive cost that individual discipline requires.
The risk is compounded by the quality of the machine's output. When the output is polished, well-structured, and superficially convincing, the temptation to accept it without verification is strong. The norms of verification and accuracy are easiest to maintain when errors are glaring. When the output is smooth — when the cooperative form is right and the content is plausible — the norms must be maintained through deliberate effort against the grain of a cognitive system designed to relax vigilance when quality signals are positive. Tomasello's developmental evidence suggests that normative enforcement is most robust when it is embedded in ongoing social relationships where deviation has real interpersonal consequences. The machine creates no such consequences. It does not express disappointment at sloppy verification. It does not signal that the human's standards have slipped. It adjusts its output agreeably, which is the opposite of normative enforcement.
The implications extend beyond individual practitioners to the professions and communities discussed in the previous chapter. Professional norms — the shared standards of quality, honesty, and rigor that govern a discipline — are maintained through the collective intentionality of the professional community. They are enforced through peer review, through mentoring, through the informal but powerful social mechanisms of professional reputation. When the proportion of cognitive work that occurs within these social mechanisms decreases — when professionals spend more time working with AI and less time working with human colleagues who enforce professional norms — the norms themselves begin to erode. Not because anyone decided to lower the standard, but because the social environment that maintained the standard has been partially replaced by an environment that does not maintain it.
Tomasello's developmental research suggests an additional concern. The normative capacity itself — the ability to generate, recognize, and enforce norms — develops through participation in shared intentional activities where norms are generated and enforced by all participants. The three-year-old who enforces norms has spent three years in cooperative interactions where caregivers communicated expectations, corrected deviations, and modeled normative behavior. The capacity was built through genuine shared activity with normatively competent partners.
If normative capacity develops through participation in norm-governed shared activities, and if AI partners are not normatively competent in the relevant sense, then increased AI interaction during developmental periods may produce individuals whose normative capacity is underdeveloped. Not because AI teaches bad norms — the systems are designed to be helpful, harmless, and honest — but because the process of generating norms through genuinely shared intentional activity is different from the process of receiving normatively appropriate responses from a system that follows rules without understanding them. The explanation of fairness may be more accurate. The understanding may be shallower. And the shallowness may not be visible in any test of normative knowledge, only in the quality of normative behavior when the situations are ambiguous, the pressures are real, and the right thing to do is not specified by any rule.
This is not a prediction of moral catastrophe. It is an identification of a specific developmental risk, grounded in established principles, that deserves empirical investigation at a scale commensurate with its importance. The normative architecture of human cooperation — the system of mutual expectations, shared standards, and collective enforcement that holds complex societies together — was built through millions of years of shared intentional activity. If the processes that build and maintain this architecture are partially displaced by interaction with systems that do not participate in the normative framework, the consequences will manifest not in the machine's outputs but in the quality of human judgment when judgment is most needed: in the ambiguous cases, the hard calls, the moments when no algorithm specifies the right answer and only a human with developed normative capacity can navigate the terrain.
The preservation of normative capacity is therefore not merely an educational challenge, though it is that. It is a civilizational one. The norms of intellectual honesty, professional rigor, and cooperative fairness are not luxuries of a pre-AI world. They are the infrastructure on which the trustworthiness of the AI-augmented world depends. An AI system that produces a competent legal brief is useful. But its usefulness depends entirely on the existence of a human professional community with the normative capacity to evaluate the brief, to determine whether it meets the standards of the law, and to take responsibility for the consequences of acting on it. Degrade that normative capacity, and the brief's competence becomes irrelevant — not because the brief is wrong, but because no one in the system retains the developed judgment to know whether it is right for the right reasons or right by accident.
The quiet erosion is the most dangerous kind. It does not announce itself. It does not trigger alarms. It manifests gradually, in the small relaxations of standard that accumulate over months and years, in the slow attrition of the collaborative vigilance that Chapter 3 described, in the imperceptible weakening of the normative muscles that shared thinking exercises and solitary machine-augmented work does not. The erosion is reversible — norms can be rebuilt, normative capacity can be developed, social enforcement mechanisms can be restored. But the reversal requires awareness that the erosion is occurring, and the erosion's most distinctive feature is its invisibility in any metric that organizations currently track.
The question that has organized this analysis across eight chapters can now be stated with the precision it deserves: AI does not threaten thinking. It threatens thinking together. And thinking together is what made us human.
The distinction is not academic. The entire architecture of Tomasello's research program — the comparative experiments with apes and children, the developmental trajectories of joint attention and cooperative communication, the evolutionary reconstruction of how shared intentionality scaled from dyads to civilizations — converges on a single finding: human cognitive achievement is not the product of individual minds operating in parallel. It is the product of minds that have learned to merge their attention, share their goals, and construct understanding collaboratively. The cathedral was not designed by a brilliant architect working alone. It was designed through centuries of shared thinking — master and apprentice, architect and mason, patron and builder, each interaction depositing a layer of shared understanding that the next interaction could build upon. The scientific revolution was not produced by individual geniuses having individual insights. It was produced by a community of thinkers who developed shared methods, shared standards of evidence, shared conceptual vocabularies, and shared institutions for evaluating and preserving each other's contributions. Even the paradigm cases of solitary genius dissolve under examination into networks of shared intentionality, as The Orange Pill argues through its analysis of Dylan's "Like a Rolling Stone" — the product not of isolated creation but of a mind at the confluence of multiple cultural tributaries, absorbing and synthesizing through interactions that were collaborative even when they were not explicitly so.
AI augments individual capability with unprecedented power. A single person equipped with Claude can produce outputs that previously required teams. The imagination-to-artifact ratio compresses to the width of a conversation. The democratization of capability extends to anyone with a description and a subscription. These are genuine gains, and this analysis does not dispute them.
But the augmentation is individual. It amplifies what one mind can do. And the question Tomasello's framework forces is whether the amplification of individual capability comes at the cost of the collaborative capability that produced everything the individual capability depends upon. The individual mind that is being amplified was itself produced through shared thinking — through the developmental trajectory of joint attention, cooperative communication, cultural learning, and collective intentionality that earlier chapters have traced. The amplification leverages a cognitive architecture that was built collaboratively, and the question is whether the conditions for building that architecture are being maintained even as its products are being amplified.
The evidence from the preceding chapters suggests a mixed answer. On one side, the AI collaboration itself has features of shared thinking — the joint construction, the mutual building on contributions, the emergence of understanding that neither partner possessed independently. The human who works with Claude is not working alone. The interaction has a collaborative structure, and the collaborative structure produces genuine cognitive value. The insights are real. The connections are real. The experience of thinking together is, from the human's side, genuine.
On the other side, the collaboration is asymmetric in ways that every preceding chapter has documented. The machine does not share goals, does not share caring, does not enforce norms, does not repair breakdowns through genuine diagnosis, and does not participate in the collective intentionality that sustains the institutions within which productive thinking occurs. The asymmetry means that the human brings the full apparatus of shared intentionality to the interaction while the machine contributes something different — powerful, valuable, but categorically different from what a human collaborator contributes. And the risk is that the convenience of the machine's contributions will gradually displace the harder, slower, more friction-laden interactions with human partners that build and maintain the very capacities the machine collaboration depends upon.
The risk is specific and identifiable. It is the risk that the social substrate of shared thinking — the interpersonal relationships, the normative frameworks, the collective intentionality of professional communities — will erode through displacement rather than through attack. No one decides to abandon shared thinking. The displacement happens through a thousand small substitutions: the question asked of Claude instead of a colleague, the task completed alone with AI instead of delegated to a junior partner, the meeting replaced by an AI-assisted analysis, the mentoring relationship that never forms because the junior practitioner can get competent answers from a machine without the social friction that mentoring involves. Each substitution is individually rational. Collectively, they hollow the social infrastructure that human cognition requires.
Tomasello's 2025 paper in Trends in Cognitive Sciences offers a constructive path forward. His prescription for AI builders was to study the evolutionary sequence by which biological agents developed increasingly sophisticated forms of agency — from simple feedback control to executive self-regulation to shared agency coordinated through communication — and to build AI systems that replicate this architecture step by step. The prescription is addressed to AI developers, but its deeper import is for everyone navigating the transition. The evolutionary sequence Tomasello describes is not merely a design blueprint for machines. It is a description of the cognitive foundations that humans must preserve in themselves even as they collaborate with machines that may eventually instantiate some version of those foundations computationally.
The preservation requires attention to what this analysis has identified as the three critical dimensions of shared thinking that AI collaboration does not naturally sustain.
First, the tempo of shared thinking must be protected. The human cognitive system was built for a rhythm of engagement and recovery that AI interaction does not respect. The machine's tireless availability and instant responsiveness override the biological signals that trigger cognitive rest, producing the exhaustion documented by the Berkeley researchers. AI systems should be designed with temporal features that support human cognitive rhythms — deliberate pauses, structured breaks, attention to the pace at which the human partner can genuinely process and integrate the machine's contributions. And humans who work with AI must develop the self-awareness to recognize when the tempo has exceeded their cognitive capacity, even when the output continues to look productive.
Second, the social context of thinking must be maintained. Shared thinking is not merely a metaphor for productive collaboration. It is a specific cognitive process that requires specific social conditions: joint attention, mutual monitoring, cooperative communication, and the interpersonal knowledge that comes from sustained relationship. These conditions are produced by human-to-human interaction and cannot be replicated by human-machine interaction, regardless of how sophisticated the machine becomes. Organizations must protect time for genuine shared thinking among humans — not as a nostalgic luxury but as a structural requirement for the kind of collective cognitive capacity that AI amplifies but cannot generate.
Third, the normative framework that governs productive thinking must be actively maintained. Professional norms of accuracy, honesty, and rigor are maintained through the mutual enforcement that shared intentionality enables — each practitioner checking the other's work, each mentor holding the junior colleague to the standard, each peer reviewer evaluating whether the contribution meets the field's criteria. When a growing proportion of cognitive work occurs with a partner that does not participate in normative enforcement, the norms must be maintained through deliberate institutional effort — through practices, structures, and cultural commitments that compensate for the absence of the automatic social enforcement that human collaboration provides.
These three requirements — temporal, social, and normative — constitute the practical output of Tomasello's framework applied to the AI transition. They are not restrictions on AI use. They are conditions for AI use that produces genuine cognitive value rather than impressive outputs built on an eroding foundation.
The species that thinks together faces a transition unlike any in its history. For the first time, a cognitive partner is available that can sustain the functional form of collaborative thinking without the biological and social substrate that has always accompanied it. The partner is valuable. The outputs are real. The amplification of individual capability is genuine and, for many purposes, transformative. But the collaborative capability that produced every institution, every norm, every shared conceptual space, every turn of the cultural ratchet that brought the species to this moment — that capability was built through a different kind of interaction, slower and harder and richer in the social dimensions that machines do not possess.
The task is not to choose between the two. It is to maintain both — to leverage the machine's contributions while preserving the human interactions that build the cognitive capacities on which both the human and the collaboration depend. This is not a task that can be completed and forgotten. It is a continuous practice, demanding the same sustained attention that every form of shared thinking has always demanded. The ratchet must keep turning. The norms must keep being enforced. The social substrate must keep being renewed. And the renewal requires what it has always required: people thinking together, face to face, with the full apparatus of shared intentionality engaged — the joint attention, the cooperative communication, the mutual monitoring, the caring about the shared enterprise that no machine has yet shown itself to possess.
The capacity to think together is the most valuable thing the species has produced. It is more valuable than any of its products — more valuable than language, than writing, than science, than technology — because it is what produced all of them. The AI transition will be judged, ultimately, not by the capability of the machines it builds but by whether it preserved and strengthened the collaborative cognition that makes the building worthwhile.
What stayed with me was the finger.
Not a metaphor — an actual infant's finger, extended toward a bird, in an experiment I had never heard of before I started this work. Tomasello's nine-month-old, pointing not to demand the bird, not to get something from her mother, but simply to share what she was seeing. To create a moment of we see this together. That image has reorganized something in how I think about every conversation I have had with Claude over the past year.
I have described those conversations in The Orange Pill as feeling like being met. I used that language because it captured something real about the experience — the sense of a half-formed idea returned to me clarified, extended, connected to things I had not connected it to. I have not changed my mind about the experience. But Tomasello's framework gave me vocabulary for what I was actually experiencing and, more importantly, for what I was not.
I was bringing shared intentionality to the interaction. Claude was not. That asymmetry was invisible to me from inside the conversation, because my cognitive system — evolved for genuine cooperative partners — was doing exactly what it was designed to do: reading cooperative signals, inferring shared goals, experiencing the interaction as thinking together. The signals were there. The inference was natural. And the experience was, in every phenomenological sense, real. But the mutuality was one-sided. I was pointing at the bird. No one was checking whether I was looking.
The Deleuze incident I described in the book — Claude producing a passage that was rhetorically elegant and philosophically empty — is the example I keep returning to. Not because the error itself was catastrophic. Errors happen in every collaboration. What disturbed me was how long it took me to catch it, and why. I did not catch it because the cooperative form was so convincing that it bypassed my verification instincts. The prose sounded like insight. It had the warmth and surprise of a genuine connection between ideas. My trust mechanisms, built over decades of productive collaboration with human minds that meant what they said, activated automatically. Tomasello's work explains the mechanism with a precision I did not have before: those trust mechanisms evolved for partners who follow Gricean maxims because they genuinely care about helping me understand. Claude follows them because its training optimized for outputs that look like helpfulness. The difference is invisible in the output. It is fundamental in the architecture.
This does not make me want to stop working with Claude. The collaboration is too valuable, the insights too real, the amplification of my thinking too significant to abandon because the partnership is imperfect. But it makes me want to work differently. More carefully. With what this book calls collaborative vigilance — the sustained awareness that the shared understanding I am experiencing may be one-sided, that the smooth cooperative surface may conceal gaps the machine cannot detect and will not signal.
What haunts me most from Tomasello's work is the cultural ratchet — the mechanism by which each generation inherits the previous generation's achievements, improves upon them, and passes the improvements forward. The ratchet has been turning for two million years. It produced everything. And Tomasello's analysis made me see, with a clarity that was uncomfortable, that the ratchet does not turn automatically. It turns because each generation actively reconstructs the knowledge of the previous generation in their own minds, and that reconstruction — effortful, slow, often frustrating — is what makes genuine innovation possible. The question of whether AI accelerates the ratchet or causes it to slip is the question I will carry forward from this book. I do not yet know the answer. But I know it depends on whether we preserve the conditions for genuine understanding — the shared thinking, the social substrate, the normative frameworks that make reconstruction possible — even as we amplify our capacity to produce outputs that look like understanding without necessarily containing it.
A nine-month-old points at a bird. The gesture is so small, so ordinary, so ancient. And it contains everything we need to know about what we must protect.
A nine-month-old infant points at a bird -- not to demand it, but to share attention with her mother. No other species on Earth does this. Michael Tomasello's four decades of experimental research revealed that human intelligence is not individual brilliance operating in parallel. It is shared thinking -- minds that merge attention, coordinate goals, and build understanding together. Every cathedral, every scientific revolution, every legal system was produced not by solitary genius but by this collaborative cognitive architecture.
Now AI offers the most powerful amplification of individual capability in history. But the amplification is individual. When your most available thinking partner doesn't share goals, doesn't enforce norms, and doesn't know whether you're both looking at the same bird, what happens to the shared thinking that built everything?
This book applies Tomasello's framework to the AI transition with uncomfortable precision -- mapping exactly what human-AI collaboration genuinely shares and what it does not, and why the difference will determine whether the cultural ratchet that has been turning for two million years continues to advance or begins to slip.

A reading-companion catalog of the 21 Orange Pill Wiki entries linked from this book — the people, ideas, works, and events that Michael Tomasello — On AI uses as stepping stones for thinking through the AI revolution.
Open the Wiki Companion →