Data Leverage References

← Back to browse

Tag: foundational (12 references)

Machine Unlearning 2021 inproceedings

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot

Introduces SISA (Sharded, Isolated, Sliced, Aggregated) training for efficient exact machine unlearning. Partitions data into shards with separate models, enabling targeted retraining when data must be forgotten.

Scaling Laws for Neural Language Models 2020 article

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei

Establishes power-law scaling relationships between language model performance and model size, dataset size, and compute, spanning seven orders of magnitude.

Data Shapley: Equitable Valuation of Data for Machine Learning 2019 inproceedings

Ghorbani, Amirata and Zou, James

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain 2019 article

Tianyu Gu, Brendan Dolan-Gavitt, Siddharth Garg

First demonstration of backdoor attacks on deep neural networks. Shows that small trigger patterns in training data cause models to misclassify any input containing the trigger (e.g., stop signs with stickers classified as speed limits).

mixup: Beyond Empirical Risk Minimization 2018 inproceedings

Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

Introduces mixup, a data augmentation technique that trains on convex combinations of input pairs and their labels. Simple, data-independent, and model-agnostic approach that improves generalization and robustness.

Understanding Black-box Predictions via Influence Functions 2017 inproceedings

Pang Wei Koh, Percy Liang

Uses influence functions from robust statistics to trace model predictions back to training data, identifying training points most responsible for a given prediction.

Towards Making Systems Forget with Machine Unlearning 2015 inproceedings

Yinzhi Cao, Junfeng Yang

First formal definition of machine unlearning. Proposes converting learning algorithms into summation form to enable efficient data removal without full retraining. Foundational work establishing the unlearning problem.

Poisoning Attacks against Support Vector Machines 2012 inproceedings

Battista Biggio, Blaine Nelson, Pavel Laskov

Investigates poisoning attacks against SVMs where adversaries inject crafted training data to increase test error. Uses gradient ascent to construct malicious data points.

Curriculum Learning 2009 inproceedings

Yoshua Bengio, Jerome Louradour, Ronan Collobert, Jason Weston

Introduces curriculum learning: training models on examples of increasing difficulty. Shows this acts as a continuation method for non-convex optimization, improving both convergence speed and final generalization.

Causality: Models, Reasoning, and Inference 2009 book

Judea Pearl

Foundational book on causal inference introducing structural causal models, do-calculus, and counterfactual reasoning. Unifies graphical models with potential outcomes framework. Second edition with expanded coverage.

Active Learning Literature Survey 2009 techreport

Burr Settles

Canonical survey of active learning covering uncertainty sampling, query-by-committee, expected error reduction, variance reduction, and density-weighted methods. Establishes foundational taxonomy for the field.

Causal Inference Using Potential Outcomes: Design, Modeling, Decisions 2005 article

Donald B. Rubin

Comprehensive overview of the potential outcomes framework for causal inference. Covers experimental design, observational studies, propensity scores, and the fundamental problem of causal inference.