Data Leverage References

← Back to browse

Tag: legal-policy (27 references)

The Economics of AI Training Data: A Research Agenda 2025 article

Hamidah Oderinwale, Anna Kazlauskas

Research agenda documenting AI training data deals from 2020 to 2025. Reveals persistent market fragmentation, five distinct pricing mechanisms (from per-unit licensing to commissioning), and that most deals exclude original creators from compensation. Found only 7 of 24 major deals compensate original creators.

Canada as a Champion for Public AI: Data, Compute and Open Source Infrastructure for Economic Growth and Inclusive Innovation 2025 article

Vincent, Nicholas, Surman, Mark, Hirsch-Allen, Jake

Artificial Intelligence Act 2024 misc
Public {AI}: {Infrastructure} for the {Common} {Good} 2024 techreport

Jackson, Brandon, Cavello, B, Devine, Flynn, Garcia, Nick, Klein, Samuel J., Krasodomski, Alex, Tan, Joshua, Tursman, Eleanor

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI 2024 article

Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

Large-scale audit of over 1,800 text AI datasets analyzing trends, permissions of use and global representation. Found frequent miscategorization of licences on dataset hosting sites, with licence omission rates of more than 70% and error rates of more than 50%. Released the Data Provenance Explorer tool for practitioners.

Public AI: Making AI Work for Everyone, by Everyone 2024 misc

Marda, Nik, Sun, Jasmine, Surman, Mark

Generative AI Profile (Draft/2024) 2024 techreport
A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training 2024 article

Heila Precel, Allison McDonald, Brent Hecht, Nicholas Vincent

Copyright and Artificial Intelligence: Policy Studies and Guidance 2024 misc
Understanding CC Licenses and Generative AI 2023 misc
ISO/IEC 23894:2023 Information Technology—Artificial Intelligence—Risk Management 2023 standard
Artificial Intelligence Risk Management Framework (AI RMF 1.0) 2023 techreport
An Alternative to Regulation: The Case for Public AI 2023 article

Vincent, Nicholas, Bau, David, Schwettmann, Sarah, Tan, Joshua

Common Crawl — Web-scale Data for Research 2022 misc
The Stack: A Permissively Licensed Source Code Dataset 2022 misc
What's in the Box? An Analysis of Undesirable Content in the Common Crawl Corpus 2021 inproceedings

Alexandra Sasha Luccioni, Joseph D. Viviano

The Biggest Lie on the Internet: Ignoring the Privacy Policies and Terms of Service Policies of Social Networking Services 2020 article

Obar, Jonathan A., Oeldorf-Hirsch, Anne

Rosenbach v. Six Flags Entertainment Corp. 2019 legal

{Supreme Court of Illinois}

The Dark (Patterns) Side of UX Design 2018 inproceedings

Gray, Colin M., Kou, Yubo, Battles, Bryan, Hoggatt, Joseph, Toombs, Austin L.

General Data Protection Regulation (EU) 2016/679 2016 misc
Reality and Perception of Copyright Terms of Service for Online Content Creation 2016 inproceedings

Fiesler, Casey, Lampe, Cliff, Bruckman, Amy S.

Children's Online Privacy Protection Rule (COPPA) — 16 CFR Part 312 2013 misc
Biometric Information Privacy Act (BIPA), 740 ILCS 14 2008 misc
The Cost of Reading Privacy Policies 2008 article

McDonald, Aleecia M., Cranor, Lorrie Faith

HIPAA Privacy Rule — 45 CFR Parts 160 and 164 2000 misc

{U.S. Department of Health and Human Services}

Family Educational Rights and Privacy Act (FERPA) 1974 misc
Common Crawl – Get Started misc