Tag: collection:ugc-value (9 references)
The Economics of AI Training Data: A Research Agenda
Research agenda documenting AI training data deals from 2020 to 2025. Reveals persistent market fragmentation, five distinct pricing mechanisms (from per-unit licensing to commissioning), and that most deals exclude original creators from compensation. Found only 7 of 24 major deals compensate original creators.
The Rise of AI-Generated Content in Wikipedia
Large language models reduce public knowledge sharing on online Q&A platforms
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Large-scale audit of over 1,800 text AI datasets analyzing trends, permissions of use and global representation. Found frequent miscategorization of licences on dataset hosting sites, with licence omission rates of more than 70% and error rates of more than 50%. Released the Data Provenance Explorer tool for practitioners.
Wikipedia's value in the age of generative {AI}
If there was a generative artificial intelligence system that could, on its own, write all the information contained in Wikipedia, would it be the same as Wikipedia today?