This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Adversarial Examples (AI)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Adversarial Examples (AI)
Random Tag
Contributors
2
Ruben Bloom
1
Multicore
Posts tagged
Adversarial Examples (AI)
Most Relevant
1
142
SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow
,
mwatkins
2y
16
Review
3
35
AI Safety in a World of Vulnerable Machine Learning Systems
AdamGleave
,
EuanMcLean
2y
27
1
57
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Stephen Casper
1y
5
1
40
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort
4mo
1
2
22
What progress have we made on automated auditing?
Q
Lawrence Chan
5mo
Q
0
1
15
If I were a well-intentioned AI... I: Image classifier
Stuart Armstrong
5y
4
1
14
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Aidan O'Gara
1y
15
1
2
The Goodhart Game
John Maxwell
5y
3
1
7
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
4y
3
1
65
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
Lawrence Chan
,
Nate Thomas
3y
15
1
35
Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
,
Michael Dennis
1y
9
0
23
SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
Adam Yedidia
2y
1
1
25
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Scott Emmons
,
Luke Bailey
,
Euan Ong
1y
1
1
23
Beyond the Board: Exploring AI Robustness Through Go
AdamGleave
6mo
1
1
15
EIS IX: Interpretability and Adversaries
Stephen Casper
2y
5