AI ALIGNMENT FORUM
Wikitags
AF

Failure mode

Settings

Applied to What rationality failure modes are there? by Jacob G-W 1y ago

Applied to Paper: Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases! by Maik Zywitza 1y ago

Applied to The self-unalignment problem by Jan_Kulveit 2y ago

Applied to Don't take bad options away from people by Dumbledore's Army 2y ago

Applied to Is AI Safety dropping the ball on privacy? by markovial 2y ago

Applied to SolidGoldMagikarp (plus, prompt generation) by Jessica Rumbelow 2y ago

Yoav Ravid v1.7.0Apr 8th 2021 GMT (+10/-334) LW2

Oliver Habryka v1.6.0Dec 24th 2020 GMT Fixed typo in title LW2

Abram Demski v1.5.0Dec 23rd 2020 GMT (+333) LW2

Yoav Ravid v1.4.0Dec 23rd 2020 GMT (+74) Added a 'Posts' section LW2

Yoav Ravid v1.3.0Dec 23rd 2020 GMT LW2

Yoav Ravid v1.2.0Dec 23rd 2020 GMT Updated link types LW2

Yoav Ravid v1.1.0Dec 23rd 2020 GMT (+479) Added a bunch of examples LW2

Applied to Guarding Against the Postmodernist Failure Mode by Yoav Ravid 4y ago

Applied to A Voting Puzzle, Some Political Science, and a Nerd Failure Mode by Yoav Ravid 4y ago

Applied to Beware the Nihilistic Failure Mode by Yoav Ravid 4y ago

Applied to Failure Modes sometimes correspond to Game Mechanics by Yoav Ravid 4y ago

Yoav Ravid v1.0.0Dec 23rd 2020 GMT (+272) LW2