Tag: data-selection (9 references)
Algorithmic Collective Action in Machine Learning
Provides theoretical framework for algorithmic collective action, showing that small collectives can exert significant control over platform learning algorithms through coordinated data strategies.
DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning
Comprehensive library and empirical study of coreset selection methods for deep learning, finding that random selection remains a strong baseline across many settings.
Beyond neural scaling laws: beating power law scaling via data pruning
Coresets for Data-efficient Training of Machine Learning Models
Introduces CRAIG (Coresets for Accelerating Incremental Gradient descent), selecting subsets that approximate full gradient for 2-3x training speedups while maintaining performance.
The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards
Active Learning for Convolutional Neural Networks: A Core-Set Approach
Defines active learning as core-set selection, choosing points such that a model trained on the subset is competitive for remaining data. Provides theoretical bounds via k-Center problem.
Understanding Black-box Predictions via Influence Functions
Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.
Curriculum Learning
Introduces curriculum learning: training models on examples of increasing difficulty. Shows this acts as a continuation method for non-convex optimization, improving both convergence speed and final generalization.
Active Learning Literature Survey
Canonical survey of active learning covering uncertainty sampling, query-by-committee, expected error reduction, variance reduction, and density-weighted methods. Establishes foundational taxonomy for the field.