Data Leverage References

← Back to browse

Tag: format:survey (11 references)

Rethinking machine unlearning for large language models 2025 article

Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Yuguang Yao, Chris Yuhao Liu, Xiaojun Xu, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

Comprehensive review of machine unlearning in LLMs, aiming to eliminate undesirable data influence (sensitive or illegal information) while maintaining essential knowledge generation. Envisions LLM unlearning as a pivotal element in life-cycle management for developing safe, secure, trustworthy, and resource-efficient generative AI.

Data-centric Artificial Intelligence: A Survey 2025 article

Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, Xia Hu

Comprehensive survey on data-centric AI, providing a holistic view of three general data-centric goals (training data development, inference data development, and data maintenance) and representative methods. Covers the paradigm shift from model refinement to prioritizing data quality.

Revisiting Data Attribution for Influence Functions 2025 article

Hongbo Zhu, Angelo Cangelosi

Comprehensive review of influence functions for data attribution, examining how individual training examples influence model predictions. Covers techniques for model debugging, data curation, bias detection, and identification of mislabeled or adversarial data points.

A Systematic Review of NeurIPS Dataset Management Practices 2024 article

Yiwei Wu, Leah Ajmani, Shayne Longpre, Hanlin Li

Machine Unlearning: A Survey 2024 article

Heng Xu, Tianqing Zhu, Lefeng Zhang, Wanlei Zhou, Philip S. Yu

Comprehensive survey of machine unlearning covering definitions, scenarios, verification methods, and applications. Cited in the International AI Safety Report 2025 as a pioneering paradigm for removing sensitive information.

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment 2023 article

Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li

Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses 2022 article

Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Madry, Bo Li, Tom Goldstein

Comprehensive survey systematically categorizing dataset vulnerabilities including poisoning and backdoor attacks, their threat models, and defense mechanisms.

Training Data Influence Analysis and Estimation: A Survey 2022 article

Zayd Hammoudeh, Daniel Lowd

Language (Technology) is Power: A Critical Survey of "Bias" in NLP 2020 inproceedings

Blodgett, Su Lin, Barocas, Solon, Daume III, Hal, Wallach, Hanna

A Survey on Image Data Augmentation for Deep Learning 2019 article

Connor Shorten, Taghi M. Khoshgoftaar

Comprehensive survey of image data augmentation techniques for deep learning, covering geometric transformations, color space transforms, kernel filters, mixing images, random erasing, and neural style transfer approaches.

Active Learning Literature Survey 2009 techreport

Burr Settles

Canonical survey of active learning covering uncertainty sampling, query-by-committee, expected error reduction, variance reduction, and density-weighted methods. Establishes foundational taxonomy for the field.