Shared References

← Back to browse

Tag: data-attribution (5 references)

TRAK: Attributing Model Behavior at Scale 2023 inproceedings

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry

Introduces TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale models by leveraging random projections.

Datamodels: Predicting Predictions from Training Data 2022 inproceedings

Andrew Ilyas, Sung Min Park, Logan Engstrom, Guillaume Leclerc, Aleksander Madry

Proposes datamodels that predict model outputs as a function of training data subsets, providing a framework for understanding data attribution through retraining experiments.

Estimating Training Data Influence by Tracing Gradient Descent 2020 inproceedings

Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale

Introduces TracIn, which computes influence of training examples by tracing how test loss changes during training. Uses first-order gradient approximation and saved checkpoints for scalability.

Data Shapley: Equitable Valuation of Data for Machine Learning 2019 inproceedings

Amirata Ghorbani, James Zou

Proposes data Shapley as a metric to quantify the value of each training datum to predictor performance, satisfying equitable data valuation properties from cooperative game theory.

Understanding Black-box Predictions via Influence Functions 2017 inproceedings

Pang Wei Koh, Percy Liang

Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.