This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Subscribe
Discussion
(0)
Adversarial Training
Subscribe
Discussion
(0)
This page is a stub.
Posts tagged
Adversarial Training
Most Relevant
1
57
Takeaways from our robust injury classifier project [Redwood Research]
dmz
2y
7
1
65
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
Lawrence Chan
,
Nate Thomas
3y
15
1
57
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Stephen Casper
1y
6
2
50
Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming
Buck Shlegeris
4mo
1
0
40
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort
5mo
1
1
17
Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing
Buck Shlegeris
3y
0
1
14
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Aidan O'Gara
1y
15
1
15
Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals?
Stephen Casper
6mo
0
2
10
AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler
DanielFilan
2y
0
1
6
Some thoughts on why adversarial training might be useful
Beth Barnes
3y
4
1
24
Latent Adversarial Training
Adam Jermyn
3y
4
1
23
Beyond the Board: Exploring AI Robustness Through Go
AdamGleave
7mo
1
1
15
EIS IX: Interpretability and Adversaries
Stephen Casper
2y
5
1
14
Oversight Leagues: The Training Game as a Feature
Paul Bricman
2y
0
1
6
EIS XI: Moving Forward
Stephen Casper
2y
2