The discovery problem is the challenge of identifying valuable contributions within creative outputs too numerous for any reader, reviewer, or recommendation system designed for previous scales to process. The first cognitive surplus produced millions of blog posts, videos, and wiki edits; the discovery mechanisms built for that scale — search engines, social curation, collaborative filtering — were adequate because the volume, while large, remained within the capacity of indexing and ranking algorithms operating on textual and behavioral signals. The second cognitive surplus will produce billions of software artifacts whose character differs fundamentally from the first surplus's outputs: they are functional rather than expressive, their value is use-contextual rather than broadly comparable, and their quality cannot be assessed without testing that exceeds the capacity of any lightweight review system.
There is a parallel reading where 'discovery' is not a technical problem to be solved but a narrative device that obscures the actual mechanism of value distribution. The rhetoric of discovery assumes a world of latent value waiting to be surfaced — a billion nurses with a billion useful tools, if only the right algorithm could connect them. But the second surplus may not contain this latent value. Most software artifacts built by non-specialists will be trivial reimplementations of existing solutions, context-specific patches that work once and break everywhere else, or well-intentioned tools that encode misunderstandings of the problems they claim to solve. The 'discovery problem' in this reading is not that good tools are lost in noise; it is that platforms need a story about why abundance should be presumed generative rather than wasteful.
The real function of discovery infrastructure is not matching but filtering — not surfacing hidden gems but managing the reputational and liability risk of hosting a billion untested artifacts. Platforms will build discovery mechanisms, but these mechanisms will primarily serve platform needs: highlighting artifacts that drive engagement, that integrate with platform services, that generate data for training future models. The discourse of 'discovery' makes this filtration sound like a public service rather than what it is — a chokepoint where platforms determine which forms of second-surplus production are legible and rewarded. Community curation and hybrid quality assurance sound promising until you ask who provisions the labor, who sets the standards, and whose definition of 'domain expertise' gets institutionalized. Discovery infrastructure does not reveal the value the surplus contains; it constructs what will count as valuable.
The existing discovery mechanisms fail for second-surplus artifacts for three converging reasons. First, volume: when a billion people can build software, the signal-to-noise ratio deteriorates past the point where ranking algorithms can separate valuable from trivial artifacts. Second, context-dependence: a tool valuable to one nurse in one clinic may be useless to a nurse in a different clinic with a different workflow, so popularity metrics poorly approximate value for any specific user. Third, failure-mode opacity: unlike a blog post, whose failure is typically visible to the reader, a software tool can appear to work while containing subtle defects that only testing or domain expertise reveals.
Adequate discovery mechanisms for the second surplus must therefore combine features that existing mechanisms do not combine. Categorization by problem rather than technology, because users think about problems they need solved, not about the stacks tools are built on. Recommendation informed by user context — profession, workflow, specific needs — rather than by popularity or behavioral similarity. Community curation by domain experts who can evaluate whether a tool serves its stated purpose, not by general algorithms whose signals are behavioral. And hybrid quality assurance that combines automated technical evaluation with human domain review, surfacing not just popular tools but tools that have passed both technical and contextual tests.
The discovery problem connects directly to the governance question: discovery mechanisms are not neutral, and the entities operating them exercise decisive influence over which second-surplus artifacts find their users. If discovery is controlled by the same platforms that provide the AI tools, the entire surplus is subject to capture at the discovery layer. If discovery is controlled by community-driven infrastructure, the surplus has a better chance of being deployed toward diverse user needs rather than channeled toward platform commercial interests. The design of discovery infrastructure is thus among the most consequential governance decisions the second surplus presents.
The problem is identified in this book as a structural feature of the second cognitive surplus, drawing on observations about the inadequacy of existing discovery mechanisms (app stores, search engines, social media) for the character and volume of AI-enabled creation. The underlying analysis of discovery in abundance draws on work by Yochai Benkler, Chris Anderson's long-tail framework, and research on recommendation systems.
The volume threshold. At sufficient scale, ranking algorithms cannot distinguish signal from noise; new mechanisms are required.
Context-dependence of value. Second-surplus artifacts are often valuable to specific users in specific contexts; popularity is a poor proxy.
Failure-mode opacity. Software can appear functional while failing in ways that require domain expertise to detect.
The platform capture risk. Discovery infrastructure concentrates power; the entities controlling it shape which artifacts find users.
Hybrid discovery requirement. Adequate mechanisms combine automated technical assessment, community curation, and context-sensitive recommendation.
The volume threshold is real (90% Edo, 10% contrarian). A billion people generating software artifacts does create a scale problem that existing mechanisms cannot handle; this is not rhetorical. But the contrarian point holds weight on the composition question: what percentage of second-surplus artifacts will have genuine value to someone? If most are trivial, discovery is less urgent than the framing suggests. The right synthesis names both: volume is a real constraint, and the value distribution within that volume is empirically uncertain.
On context-dependence, the views converge differently depending on what success means. If success is 'nurse finds tool that solves her specific problem,' the entry is right (80%) — popularity is a poor proxy, and context-sensitive recommendation is necessary. But if success is 'platforms sustain infrastructure for niche matching at scale,' the contrarian view weighs heavier (60%) — the economics may not support discovery mechanisms fine-grained enough to serve long-tail needs. The right frame here is not whether context-dependence is real, but whether discovery infrastructure can be provisioned to serve it without collapsing into coarse filters.
On governance, the entry and the contrarian view are naming the same risk from different angles (50/50 weighting). Edo is right that discovery infrastructure concentrates power and shapes which artifacts find users. The contrarian view is right that this makes 'discovery' a site of justification rather than a neutral service. The synthesis is that discovery is both a real technical problem and a political one — the mechanisms we build will determine what the second surplus becomes, and the discourse of 'solving discovery' can obscure whose interests those mechanisms serve.