AI ALIGNMENT FORUM
Tags
AF

Corrigibility

•

Applied to Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility by Raymond Arnold 3mo ago

•

Applied to A Shutdown Problem Proposal by Mateusz Bagiński 5mo ago

•

Applied to Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural by RobertM 5mo ago

•

Applied to Towards shutdownable agents via stochastic choice by Elliott Thornley 6mo ago

•

Applied to Corrigibility = Tool-ness? by Tobias D. 6mo ago

•

Applied to 4. Existing Writing on Corrigibility by Max Harms 6mo ago

•

Applied to 3b. Formal (Faux) Corrigibility by Max Harms 6mo ago

•

Applied to 3a. Towards Formal Corrigibility by Max Harms 6mo ago

•

Applied to 2. Corrigibility Intuition by Max Harms 6mo ago

•

Applied to Corrigibility could make things worse by ThomasCederborg 6mo ago

•

Applied to 5. Open Corrigibility Questions by Ruben Bloom 6mo ago

•

Applied to 0. CAST: Corrigibility as Singular Target by Max Harms 7mo ago

•

Applied to 1. The CAST Strategy by Max Harms 7mo ago

•

Applied to The Shutdown Problem: Incomplete Preferences as a Solution by Elliott Thornley 10mo ago

•

Applied to Requirements for a Basin of Attraction to Alignment by Roger Dearnaley 11mo ago

•

Applied to Nash Bargaining between Subagents doesn't solve the Shutdown Problem by A.H. 11mo ago

•

Applied to Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom) by Roger Dearnaley 1y ago

•

Applied to A Pedagogical Guide to Corrigibility by A.H. 1y ago