The data workers are the humans whose labor produces, curates, evaluates, and labels the material that trains and aligns large language models. They include the content moderators who review the most toxic outputs of the internet so the model can learn what toxicity looks like; the RLHF contractors whose judgments about quality and harm shape the model's alignment; the click workers in Kenya, the Philippines, Venezuela, and elsewhere whose labor is compensated at rates that would be illegal in the countries where the models are deployed. Their invisibility is the most Harawayan absence in You On AI — an erasure precisely analogous to the gendered invisibility of domestic labor.
The machine does not learn from data. It learns from the labor of the people who organized, labeled, evaluated, and curated the data. The substitution of "training data" for "the work of thousands of underpaid humans"