Tag: data-attribution (5 references)
TRAK: Attributing Model Behavior at Scale
Introduces TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale models by leveraging random projections.
Datamodels: Predicting Predictions from Training Data
Proposes datamodels that predict model outputs as a function of training data subsets, providing a framework for understanding data attribution through retraining experiments.
Estimating Training Data Influence by Tracing Gradient Descent
Introduces TracIn, which computes influence of training examples by tracing how test loss changes during training. Uses first-order gradient approximation and saved checkpoints for scalability.
Data Shapley: Equitable Valuation of Data for Machine Learning
Proposes data Shapley as a metric to quantify the value of each training datum to predictor performance, satisfying equitable data valuation properties from cooperative game theory.
Understanding Black-box Predictions via Influence Functions
Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.