The data workers are the humans whose labor produces, curates, evaluates, and labels the material that trains and aligns large language models. They include the content moderators who review the most toxic outputs of the internet so the model can learn what toxicity looks like; the RLHF contractors whose judgments about quality and harm shape the model's alignment; the click workers in Kenya, the Philippines, Venezuela, and elsewhere whose labor is compensated at rates that would be illegal in the countries where the models are deployed. Their invisibility is the most Harawayan absence in The Orange Pill — an erasure precisely analogous to the gendered invisibility of domestic labor.
There is a parallel reading that begins not with labor's invisibility but with extraction's necessity. The data worker economy represents neither accident nor aberration but the predictable optimization of a system that requires massive human judgment at scales no wealthy population would tolerate providing. The $2/hour Kenyan content moderator exists because the alternative—having San Francisco engineers review child abuse content—would cost $200/hour and create liability nightmares for the companies involved. The geography of this work follows mineral extraction patterns precisely: the raw material (human judgment) exists everywhere, but extraction happens where regulation is weakest and desperation highest.
The structural concealment Segal identifies is real but incomplete. These workers are invisible not merely because their labor is hidden but because the entire AI ecosystem depends on maintaining specific gradients of economic desperation. The Venezuelan click worker labeling images for $0.30/hour exists in a carefully maintained state—desperate enough to accept the work, connected enough to perform it, powerless enough to have no recourse when payment is withheld. Tech companies don't create these conditions but they require them. The philanthropic initiatives, the digital literacy programs, the connectivity projects—these aren't separate from the exploitation but part of its infrastructure. The same companies funding internet access in Kenya are recruiting content moderators there. The invisibility isn't just Harawayan; it's thermodynamic. The system requires a heat sink for psychological damage, a place where trauma can be absorbed cheaply. The workers aren't companions to the AI ecosystem—they're its metabolic substrate, the layer that must be consumed for the reaction to proceed.
The machine does not learn from data. It learns from the labor of the people who organized, labeled, evaluated, and curated the data. The substitution of "training data" for "the work of thousands of underpaid humans" is not accidental. It is structural. It serves the interests of those who profit from the output by concealing the conditions of its production, just as the visible labor of the builder depends on the invisible domestic labor of a partner, the machine's impressive capabilities depend on the invisible cognitive labor of data workers whose contributions are structurally concealed.
Investigative journalism has documented the specific conditions. Time magazine reported in 2023 that Kenyan workers earning less than $2 an hour were hired to train ChatGPT on what content qualified as toxic — reading and categorizing descriptions of child sexual abuse, bestiality, murder, torture. The workers described PTSD symptoms, nightmares, and lasting psychological harm. The contracts were terminated early. The workers were left without the mental health support the work had made necessary.
The parallel with content moderation for social media platforms is exact. The same pattern repeats: the harm is outsourced to populations whose precarity makes them unable to refuse, the labor is invisible to the users of the system, and the visible output (a usable product) depends entirely on the invisible input (traumatized workers in poorly regulated contracts).
The Harawayan analysis insists that the cyborg condition cannot be honestly described without including this labor. The builder who celebrates the machine's helpfulness without attending to the workers who made the helpfulness possible is practicing the same erasure that conceals the domestic infrastructure underwriting his productive intensity. The erasure is gendered, racialized, and geographically specific, and it reproduces within the cyborg condition the same power structures that the Manifesto was written to contest.
The analysis of data workers as constitutive rather than peripheral to AI systems has been developed by Mary L. Gray and Siddharth Suri (Ghost Work, 2019), Astra Taylor, Billy Perrigo's 2023 Time reporting on OpenAI's Kenyan contractors, the Distributed AI Research Institute (DAIR) founded by Timnit Gebru, and scholars in the plantation logic tradition including Achille Mbembe.
Labor, not data. The machine learns from human work, not from a neutral resource called "data."
Invisibility is structural. The erasure of the labor serves the interests of those who profit from the output.
Geographic exploitation. The work is disproportionately performed in lower-income countries under conditions that would be illegal in the deploying jurisdictions.
Psychological cost is externalized. The harms of exposure to toxic content are borne by the workers; the benefits accrue to the users of the safer outputs.
Visibility is necessary but not sufficient. Naming the labor is the beginning of accountability; structural change in compensation, consent, and care is the rest.
The truth about data workers depends entirely on which question we're asking. If we're asking about moral responsibility, Segal's framing dominates (90%)—these are indeed invisible laborers whose exploitation mirrors historical patterns of gendered and racialized erasure. The Harawayan analysis correctly identifies how the fiction of "machine learning" conceals human suffering. But if we're asking about systemic dynamics, the contrarian view weighs heavily (75%)—this isn't aberration but optimization, not oversight but architecture. The system genuinely requires these economic gradients to function at current scales.
When we examine specific mechanisms, the balance shifts by facet. On the question of inevitability, both views converge (50/50): the exploitation is both structurally determined (as the contrarian argues) and politically contingent (as Segal's citations of regulatory differences suggest). On corporate intentionality, Segal's view prevails (70%)—companies actively obscure these workers through NDAs, subcontracting, and technical language. But on the question of alternatives, the contrarian position strengthens (65%)—no clear path exists to AI development without some form of massive human judgment input, and wealthy populations consistently refuse this work.
The synthetic frame that emerges treats data workers as occupying a transition state—neither permanently invisible nor destined for recognition, but caught in a historical moment where AI requires human judgment at scales that existing economic structures can only provide through exploitation. The real insight isn't that these workers are invisible (Segal) or that they're necessarily exploited (contrarian), but that they represent a contradiction the system cannot resolve: AI needs human judgment to transcend human judgment. The workers exist at this contradiction's sharp edge, bearing its costs while the system searches for ways to eliminate its need for them entirely.