Tag: adversarial (10 references)
Exploring the limits of strong membership inference attacks on large language models
Poisoning Web-Scale Training Datasets is Practical
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
A Watermark for Large Language Models
OWASP Top 10 for Large Language Model Applications
Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
Comprehensive survey systematically categorizing dataset vulnerabilities including poisoning and backdoor attacks, their threat models, and defense mechanisms.
Robust Speech Recognition via Large-Scale Weak Supervision
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
First demonstration of backdoor attacks on deep neural networks. Shows that small trigger patterns in training data cause models to misclassify any input containing the trigger (e.g., stop signs with stickers classified as speed limits).
Poisoning Attacks against Support Vector Machines
Investigates poisoning attacks against SVMs where adversaries inject crafted training data to increase test error. Uses gradient ascent to construct malicious data points.