Data Leverage References

← Back to browse

Tag: influence-functions (5 references)

Distributional Training Data Attribution: What do Influence Functions Sample? 2025 article

Bruno Mlodozeniec, Isaac Reid, Sam Power, David Krueger, Murat Erdogdu, Richard E. Turner, Roger Grosse

Introduces distributional training data attribution (d-TDA), which predicts how the distribution of model outputs depends upon the dataset. Shows that influence functions are "secretly distributional"—they emerge from this framework as the limit to unrolled differentiation without requiring restrictive convexity assumptions.

Revisiting Data Attribution for Influence Functions 2025 article

Hongbo Zhu, Angelo Cangelosi

Comprehensive review of influence functions for data attribution, examining how individual training examples influence model predictions. Covers techniques for model debugging, data curation, bias detection, and identification of mislabeled or adversarial data points.

A Versatile Influence Function for Data Attribution with Non-Decomposable Loss 2024 article

Junwei Deng, Weijing Tang, Jiaqi W. Ma

Proposes Versatile Influence Function (VIF) designed to fully leverage auto-differentiation, eliminating case-specific derivations. Demonstrated across Cox regression for survival analysis, node embedding for network analysis, and listwise learning-to-rank, with estimates closely resembling leave-one-out retraining while being up to 10^3 times faster.

Influence Functions for Scalable Data Attribution in Diffusion Models 2024 article

Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae, Alexander Immer, David Krueger, Richard Turner

Develops influence function frameworks for diffusion models to address data attribution and interpretability challenges. Predicts how model output would change if training data were removed, showing how previously proposed methods can be interpreted as particular design choices in this framework.

Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration 2024 inproceedings

Kangxi Wu, Liang Pang, Huawei Shen, Xueqi Cheng

Enhances training data attribution methods for large language models including LLaMA2, QWEN2, and Mistral by considering fitting error in the attribution process.