This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Subscribe
Discussion
(0)
Adversarial Examples (AI)
Multicore
Ruben Bloom
Adversarial Examples (AI)
Subscribe
Discussion
(0)
Written by
Multicore
,
Ruben Bloom
last updated
14th Dec 2024
Summaries
Cancel
Submit
This page is a stub.
Posts tagged
Adversarial Examples (AI)
Most Relevant
1
143
SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow
,
mwatkins
2y
17
3
35
AI Safety in a World of Vulnerable Machine Learning Systems
AdamGleave
,
EuanMcLean
2y
27
1
57
Deep Forgetting & Unlearning for Safely-Scoped LLMs
Stephen Casper
1y
6
1
40
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort
5mo
1
2
22
What progress have we made on automated auditing?
Q
Lawrence Chan
7mo
Q
0
1
15
If I were a well-intentioned AI... I: Image classifier
Stuart Armstrong
5y
4
1
14
Adversarial Robustness Could Help Prevent Catastrophic Misuse
Aidan O'Gara
1y
15
1
2
The Goodhart Game
John Maxwell
5y
3
1
7
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
4y
3
1
65
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
Lawrence Chan
,
Nate Thomas
3y
15
1
35
Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
,
Michael Dennis
2y
9
0
23
SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
Adam Yedidia
2y
1
1
25
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Scott Emmons
,
Luke Bailey
,
Euan Ong
1y
1
1
23
Beyond the Board: Exploring AI Robustness Through Go
AdamGleave
7mo
1
1
15
EIS IX: Interpretability and Adversaries
Stephen Casper
2y
5