This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Corrigibility
•
Applied to
Extending the Off-Switch Game: Toward a Robust Framework for AI Corrigibility
by
Raymond Arnold
3mo
ago
•
Applied to
A Shutdown Problem Proposal
by
Mateusz Bagiński
5mo
ago
•
Applied to
Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
by
RobertM
5mo
ago
•
Applied to
Towards shutdownable agents via stochastic choice
by
Elliott Thornley
6mo
ago
•
Applied to
Corrigibility = Tool-ness?
by
Tobias D.
6mo
ago
•
Applied to
4. Existing Writing on Corrigibility
by
Max Harms
6mo
ago
•
Applied to
3b. Formal (Faux) Corrigibility
by
Max Harms
6mo
ago
•
Applied to
3a. Towards Formal Corrigibility
by
Max Harms
6mo
ago
•
Applied to
2. Corrigibility Intuition
by
Max Harms
6mo
ago
•
Applied to
Corrigibility could make things worse
by
ThomasCederborg
6mo
ago
•
Applied to
5. Open Corrigibility Questions
by
Ruben Bloom
6mo
ago
•
Applied to
0. CAST: Corrigibility as Singular Target
by
Max Harms
7mo
ago
•
Applied to
1. The CAST Strategy
by
Max Harms
7mo
ago
•
Applied to
The Shutdown Problem: Incomplete Preferences as a Solution
by
Elliott Thornley
10mo
ago
•
Applied to
Requirements for a Basin of Attraction to Alignment
by
Roger Dearnaley
11mo
ago
•
Applied to
Nash Bargaining between Subagents doesn't solve the Shutdown Problem
by
A.H.
11mo
ago
•
Applied to
Requirements for a STEM-capable AGI Value Learner (my Case for Less Doom)
by
Roger Dearnaley
1y
ago
•
Applied to
A Pedagogical Guide to Corrigibility
by
A.H.
1y
ago