Tag: data-selection

Tag: data-selection (12 references)

Algorithmic Collective Action in Machine Learning 2023 inproceedings

Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dünner, Tijana Zrnic

Provides theoretical framework for algorithmic collective action, showing that small collectives can exert significant control over platform learning algorithms through coordinated data strategies.

View details Source International Conference on Machine Learning (ICML)

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning 2022 article

Chengcheng Guo, Bo Zhao, Yanbing Bai

Comprehensive library and empirical study of coreset selection methods for deep learning, finding that random selection remains a strong baseline across many settings.

benchmark data-selection language-models ml-methods

View details Source DEXA

Beyond neural scaling laws: beating power law scaling via data pruning 2022 article

Sorscher, Ben, Geirhos, Robert, Shekhar, Shashank, Ganguli, Surya, Morcos, Ari

ml-methods data-selection training-dynamics

View details Source Advances in Neural Information Processing Systems

Coresets for Data-efficient Training of Machine Learning Models 2020 inproceedings

Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

Introduces CRAIG (Coresets for Accelerating Incremental Gradient descent), selecting subsets that approximate full gradient for 2-3x training speedups while maintaining performance.

data-selection interpretability ml-methods training-dynamics

View details Source International Conference on Machine Learning (ICML)

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards 2018 misc

Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, Kasia Chmielinski

ml-methods data-selection

View details Source

Active Learning for Convolutional Neural Networks: A Core-Set Approach 2018 inproceedings

Ozan Sener, Silvio Savarese

Defines active learning as core-set selection, choosing points such that a model trained on the subset is competitive for remaining data. Provides theoretical bounds via k-Center problem.

data-selection language-models ml-methods

View details Source International Conference on Learning Representations (ICLR)

Understanding Black-box Predictions via Influence Functions 2017 inproceedings

Pang Wei Koh, Percy Liang

Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.

data-attribution data-governance data-selection interpretability ml-methods foundational

View details Source Proceedings of the 34th International Conference on Machine Learning (ICML)

Curriculum Learning 2009 inproceedings

Yoshua Bengio, Jerome Louradour, Ronan Collobert, Jason Weston

Introduces curriculum learning: training models on examples of increasing difficulty. Shows this acts as a continuation method for non-convex optimization, improving both convergence speed and final generalization.

data-selection foundational ml-methods training-dynamics

View details Source ICML 2009

Active Learning Literature Survey 2009 techreport

Burr Settles

Canonical survey of active learning covering uncertainty sampling, query-by-committee, expected error reduction, variance reduction, and density-weighted methods. Establishes foundational taxonomy for the field.

data-selection foundational ml-methods format:survey

View details Source University of Wisconsin-Madison, Computer Sciences Technical Report 1648

Active Learning paper_collection

ml-methods data-selection training-dynamics

View details

Data Augmentation & Curriculum Learning paper_collection

ml-methods training-dynamics data-selection

View details

Data Selection & Coresets paper_collection

ml-methods training-dynamics data-selection

View details