You On AI Field Guide · Data Workers The You On AI Field Guide Home
Txt Low Med High
CONCEPT

Data Workers

The invisible companion species of the AI ecosystem — the content moderators, RLHF annotators, and click workers whose organized, compensated, often exploitative labor is constitutive of the machine's capabilities and is structurally concealed by the fiction that the machine "learns" from "data."

The data workers are the humans whose labor produces, curates, evaluates, and labels the material that trains and aligns large language models. They include the content moderators who review the most toxic outputs of the internet so the model can learn what toxicity looks like; the RLHF contractors whose judgments about quality and harm shape the model's alignment; the click workers in Kenya, the Philippines, Venezuela, and elsewhere whose labor is compensated at rates that would be illegal in the countries where the models are deployed. Their invisibility is the most Harawayan absence in You On AI — an erasure precisely analogous to the gendered invisibility of domestic labor.

In The You On AI Field Guide

The machine does not learn from data. It learns from the labor of the people who organized, labeled, evaluated, and curated the data. The substitution of "training data" for "the work of thousands of underpaid humans" is not accidental. It is structural. It serves the interests of those who profit from the output by concealing the conditions of its production, just as the visible labor of the builder depends on the invisible domestic labor of a partner, the machine's impressive capabilities depend on the invisible cognitive labor of data workers whose contributions are structurally concealed.

Investigative journalism has documented the specific conditions. Time magazine reported in 2023 that Kenyan workers earning less than $2 an hour were hired to train ChatGPT on what content qualified as toxic — reading and categorizing descriptions of child sexual abuse, bestiality, murder, torture. The workers described PTSD symptoms, nightmares, and lasting psychological harm. The contracts were terminated early. The workers were left without the mental health support the work had made necessary.

The parallel with content moderation for social media platforms is exact. The same pattern repeats: the harm is outsourced to populations whose precarity makes them unable to refuse, the labor is invisible to the users of the system, and the visible output (a usable product) depends entirely on the invisible input (traumatized workers in poorly regulated contracts).

The Harawayan analysis insists that the cyborg condition cannot be honestly described without including this labor. The builder who celebrates the machine's helpfulness without attending to the workers who made the helpfulness possible is practicing the same erasure that conceals the domestic infrastructure underwriting his productive intensity. The erasure is gendered, racialized, and geographically specific, and it reproduces within the cyborg condition the same power structures that the Manifesto was written to contest.

Origin

The analysis of data workers as constitutive rather than peripheral to AI systems has been developed by Mary L. Gray and Siddharth Suri (Ghost Work, 2019), Astra Taylor, Billy Perrigo's 2023 Time reporting on OpenAI's Kenyan contractors, the Distributed AI Research Institute (DAIR) founded by Timnit Gebru, and scholars in the plantation logic tradition including Achille Mbembe.

Key Ideas

Labor, not data. The machine learns from human work, not from a neutral resource called "data."

Invisibility is structural. The erasure of the labor serves the interests of those who profit from the output.

Geographic exploitation. The work is disproportionately performed in lower-income countries under conditions that would be illegal in the deploying jurisdictions.

Psychological cost is externalized. The harms of exposure to toxic content are borne by the workers; the benefits accrue to the users of the safer outputs.

Visibility is necessary but not sufficient. Naming the labor is the beginning of accountability; structural change in compensation, consent, and care is the rest.

Explore more
Browse the full You On AI Field Guide — over 8,500 entries
← Home 0%
CONCEPT Book →