Tag: format:survey (11 references)
Rethinking machine unlearning for large language models
Comprehensive review of machine unlearning in LLMs, aiming to eliminate undesirable data influence (sensitive or illegal information) while maintaining essential knowledge generation. Envisions LLM unlearning as a pivotal element in life-cycle management for developing safe, secure, trustworthy, and resource-efficient generative AI.
Data-centric Artificial Intelligence: A Survey
Comprehensive survey on data-centric AI, providing a holistic view of three general data-centric goals (training data development, inference data development, and data maintenance) and representative methods. Covers the paradigm shift from model refinement to prioritizing data quality.
Revisiting Data Attribution for Influence Functions
Comprehensive review of influence functions for data attribution, examining how individual training examples influence model predictions. Covers techniques for model debugging, data curation, bias detection, and identification of mislabeled or adversarial data points.
A Systematic Review of NeurIPS Dataset Management Practices
Machine Unlearning: A Survey
Comprehensive survey of machine unlearning covering definitions, scenarios, verification methods, and applications. Cited in the International AI Safety Report 2025 as a pioneering paradigm for removing sensitive information.
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
Comprehensive survey systematically categorizing dataset vulnerabilities including poisoning and backdoor attacks, their threat models, and defense mechanisms.
Training Data Influence Analysis and Estimation: A Survey
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
A Survey on Image Data Augmentation for Deep Learning
Comprehensive survey of image data augmentation techniques for deep learning, covering geometric transformations, color space transforms, kernel filters, mixing images, random erasing, and neural style transfer approaches.
Active Learning Literature Survey
Canonical survey of active learning covering uncertainty sampling, query-by-committee, expected error reduction, variance reduction, and density-weighted methods. Establishes foundational taxonomy for the field.