This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Failure mode
Settings
Applied to
What rationality failure modes are there?
by
Jacob G-W
1y
ago
Applied to
Paper: Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!
by
Maik Zywitza
1y
ago
Applied to
The self-unalignment problem
by
Jan_Kulveit
2y
ago
Applied to
Don't take bad options away from people
by
Dumbledore's Army
2y
ago
Applied to
Is AI Safety dropping the ball on privacy?
by
markovial
2y
ago
Applied to
SolidGoldMagikarp (plus, prompt generation)
by
Jessica Rumbelow
2y
ago
Yoav Ravid
v1.7.0
Apr 8th 2021 GMT
(
+10
/
-334
)
LW
2
Oliver Habryka
v1.6.0
Dec 24th 2020 GMT
Fixed typo in title
LW
2
Abram Demski
v1.5.0
Dec 23rd 2020 GMT
(+333)
LW
2
Yoav Ravid
v1.4.0
Dec 23rd 2020 GMT
(+74)
Added a 'Posts' section
LW
2
Yoav Ravid
v1.3.0
Dec 23rd 2020 GMT
LW
2
Yoav Ravid
v1.2.0
Dec 23rd 2020 GMT
Updated link types
LW
2
Yoav Ravid
v1.1.0
Dec 23rd 2020 GMT
(+479)
Added a bunch of examples
LW
2
Applied to
Guarding Against the Postmodernist Failure Mode
by
Yoav Ravid
4y
ago
Applied to
A Voting Puzzle, Some Political Science, and a Nerd Failure Mode
by
Yoav Ravid
4y
ago
Applied to
Beware the Nihilistic Failure Mode
by
Yoav Ravid
4y
ago
Applied to
Failure Modes sometimes correspond to Game Mechanics
by
Yoav Ravid
4y
ago
Yoav Ravid
v1.0.0
Dec 23rd 2020 GMT
(+272)
LW
2