Data Leverage References

← Back to browse

Tag: data-selection (9 references)

Algorithmic Collective Action in Machine Learning 2023 inproceedings

Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dünner, Tijana Zrnic

Provides theoretical framework for algorithmic collective action, showing that small collectives can exert significant control over platform learning algorithms through coordinated data strategies.

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning 2022 article

Chengcheng Guo, Bo Zhao, Yanbing Bai

Comprehensive library and empirical study of coreset selection methods for deep learning, finding that random selection remains a strong baseline across many settings.

Beyond neural scaling laws: beating power law scaling via data pruning 2022 article

Sorscher, Ben, Geirhos, Robert, Shekhar, Shashank, Ganguli, Surya, Morcos, Ari

Coresets for Data-efficient Training of Machine Learning Models 2020 inproceedings

Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

Introduces CRAIG (Coresets for Accelerating Incremental Gradient descent), selecting subsets that approximate full gradient for 2-3x training speedups while maintaining performance.

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards 2018 misc

Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, Kasia Chmielinski

Active Learning for Convolutional Neural Networks: A Core-Set Approach 2018 inproceedings

Ozan Sener, Silvio Savarese

Defines active learning as core-set selection, choosing points such that a model trained on the subset is competitive for remaining data. Provides theoretical bounds via k-Center problem.

Understanding Black-box Predictions via Influence Functions 2017 inproceedings

Pang Wei Koh, Percy Liang

Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.

Curriculum Learning 2009 inproceedings

Yoshua Bengio, Jerome Louradour, Ronan Collobert, Jason Weston

Introduces curriculum learning: training models on examples of increasing difficulty. Shows this acts as a continuation method for non-convex optimization, improving both convergence speed and final generalization.

Active Learning Literature Survey 2009 techreport

Burr Settles

Canonical survey of active learning covering uncertainty sampling, query-by-committee, expected error reduction, variance reduction, and density-weighted methods. Establishes foundational taxonomy for the field.