Data Leverage References

← Back to browse

Tag: interpretability (11 references)

What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions 2024 article

Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura

LEACE: Perfect linear concept erasure in closed form 2023 article

Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, Stella Biderman

TRAK: Attributing Model Behavior at Scale 2023 inproceedings

Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander Madry

Introduces TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale models by leveraging random projections.

Why Black Box Machine Learning Should Be Avoided for High-Stakes Decisions, in Brief 2022 article

Cynthia Rudin

Coresets for Data-efficient Training of Machine Learning Models 2020 inproceedings

Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec

Introduces CRAIG (Coresets for Accelerating Incremental Gradient descent), selecting subsets that approximate full gradient for 2-3x training speedups while maintaining performance.

interpreting GPT: the logit lens 2020 misc
Estimating Training Data Influence by Tracing Gradient Descent 2020 inproceedings

Garima Pruthi, Frederick Liu, Mukund Sundararajan, Satyen Kale

Introduces TracIn, which computes influence of training examples by tracing how test loss changes during training. Uses first-order gradient approximation and saved checkpoints for scalability.

In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction 2020 article

Caroline Wang, Bin Han, Bhrij Patel, Cynthia Rudin

On the Accuracy of Influence Functions for Measuring Group Effects 2019 inproceedings

Pang Wei Koh, Kai-Siang Ang, Hubert H. K. Teo, Percy Liang

Model Cards for Model Reporting 2019 inproceedings

Mitchell, Margaret, Wu, Simone, Zaldivar, Andrew, Barnes, Parker, Vasserman, Lucy, Hutchinson, Ben, Spitzer, Elena, Raji, Inioluwa Deborah, Gebru, Timnit

Understanding Black-box Predictions via Influence Functions 2017 inproceedings

Pang Wei Koh, Percy Liang

Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.