Tag: collection:data-leverage (20 references)
Algorithmic Collective Action with Two Collectives
The Economics of AI Training Data: A Research Agenda
Research agenda documenting AI training data deals from 2020 to 2025. Reveals persistent market fragmentation, five distinct pricing mechanisms (from per-unit licensing to commissioning), and that most deals exclude original creators from compensation. Found only 7 of 24 major deals compensate original creators.
Collective Bargaining in the Information Economy Can Address AI-Driven Power Concentration
Push and Pull: A Framework for Measuring Attentional Agency on Digital Platforms
Poisoning Web-Scale Training Datasets is Practical
Large language models reduce public knowledge sharing on online Q&A platforms
Algorithmic Collective Action in Machine Learning
Provides theoretical framework for algorithmic collective action, showing that small collectives can exert significant control over platform learning algorithms through coordinated data strategies.
The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data Producers
Behavioral Use Licensing for Responsible AI
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
Comprehensive survey systematically categorizing dataset vulnerabilities including poisoning and backdoor attacks, their threat models, and defense mechanisms.
Addressing Documentation Debt in Machine Learning Research: A Retrospective Datasheet for BookCorpus
Machine Unlearning
Introduces SISA (Sharded, Isolated, Sliced, Aggregated) training for efficient exact machine unlearning. Partitions data into shards with separate models, enabling targeted retraining when data must be forgotten.
Extracting Training Data from Large Language Models
Can "Conscious Data Contribution" Help Users to Exert "Data Leverage" Against Technology Companies?
Data Leverage: A Framework for Empowering the Public in its Relationship with Technology Companies
Data Shapley: Equitable Valuation of Data for Machine Learning
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
First demonstration of backdoor attacks on deep neural networks. Shows that small trigger patterns in training data cause models to misclassify any input containing the trigger (e.g., stop signs with stickers classified as speed limits).
How Do People Change Their Technology Use in Protest?: Understanding Protest Users
"Data Strikes": Evaluating the Effectiveness of a New Form of Collective Action Against Technology Companies
Simulates data strikes against recommender systems, showing that collective withholding of training data can create leverage for users against technology platforms.