Tag: interpretability (11 references)
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
LEACE: Perfect linear concept erasure in closed form
TRAK: Attributing Model Behavior at Scale
Introduces TRAK (Tracing with the Randomly-projected After Kernel), a data attribution method that is both effective and computationally tractable for large-scale models by leveraging random projections.
Why Black Box Machine Learning Should Be Avoided for High-Stakes Decisions, in Brief
Coresets for Data-efficient Training of Machine Learning Models
Introduces CRAIG (Coresets for Accelerating Incremental Gradient descent), selecting subsets that approximate full gradient for 2-3x training speedups while maintaining performance.
interpreting GPT: the logit lens
Estimating Training Data Influence by Tracing Gradient Descent
Introduces TracIn, which computes influence of training examples by tracing how test loss changes during training. Uses first-order gradient approximation and saved checkpoints for scalability.
In Pursuit of Interpretable, Fair and Accurate Machine Learning for Criminal Recidivism Prediction
On the Accuracy of Influence Functions for Measuring Group Effects
Model Cards for Model Reporting
View details Source Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT)
Understanding Black-box Predictions via Influence Functions
Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.