Data Leverage References

← Back to browse

Tag: collection:ugc-value (9 references)

The Economics of AI Training Data: A Research Agenda 2025 article

Hamidah Oderinwale, Anna Kazlauskas

Research agenda documenting AI training data deals from 2020 to 2025. Reveals persistent market fragmentation, five distinct pricing mechanisms (from per-unit licensing to commissioning), and that most deals exclude original creators from compensation. Found only 7 of 24 major deals compensate original creators.

The Rise of AI-Generated Content in Wikipedia 2024 article

Creston Brooks, Samuel Eggert, Denis Peskoff

Large language models reduce public knowledge sharing on online Q&A platforms 2024 article

R. Maria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI 2024 article

Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

Large-scale audit of over 1,800 text AI datasets analyzing trends, permissions of use and global representation. Found frequent miscategorization of licences on dataset hosting sites, with licence omission rates of more than 70% and error rates of more than 50%. Released the Data Provenance Explorer tool for practitioners.

Wikipedia's value in the age of generative {AI} 2023 misc

Deckelmann, Selena

If there was a generative artificial intelligence system that could, on its own, write all the information contained in Wikipedia, would it be the same as Wikipedia today?

Extracting Training Data from Large Language Models 2021 inproceedings

Carlini, Nicholas, Tramer, Florian, Wallace, Eric, Jagielski, Matthew, Herbert-Voss, Ariel, Lee, Katherine, Roberts, Adam, Brown, Tom B., Song, Dawn, Erlingsson, {\'U}lfar, Oprea, Alina, Papernot, Nicolas

A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results 2021 article

Nicholas Vincent, Brent Hecht

Measuring the Importance of User-Generated Content to Search Engines 2019 inproceedings

Nicholas Vincent, Isaac Johnson, Patrick Sheehan, Brent Hecht

The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies 2017 inproceedings

Connor McMahon, Isaac Johnson, Brent Hecht