← Back to Collections
❦ User-Generated Content & AI Training Data
This collection examines the role of **user-generated content** in training and powering AI systems. ## Key Themes - **UGC in Search**: How Wikipedia and other UGC improves search engine results - **Training Data Value**: Quantifying the contribution of user content to AI models - **Platform Dependencies**: How AI systems rely on crowdsourced knowledge - **Content Creator Rights**: Implications for people who create the data AI learns from ## Related Collections See also: [Data Leverage & Collective Action](./data-leverage) for research on how content creators can exercise power over AI systems.
9 papers in this collection
- The Economics of AI Training Data: A Research Agenda
- The Rise of AI-Generated Content in Wikipedia
- Large language models reduce public knowledge sharing on online Q&A platforms
- The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
- Wikipedia's value in the age of generative {AI}
- Extracting Training Data from Large Language Models
- A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results
- Measuring the Importance of User-Generated Content to Search Engines
- The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies