This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Mesa-Optimization
•
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
1d
ago
•
Applied to
Inner Optimization Mechanisms in Neural Nets
by
ProgramCrafter
17d
ago
•
Applied to
The Human's Role in Mesa Optimization
by
silentbob
20d
ago
•
Applied to
Visualizing neural network planning
by
Nevan Wichers
20d
ago
•
Applied to
Measuring Learned Optimization in Small Transformer Models
by
Jonathan Bostock
2mo
ago
•
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
2mo
ago
•
Applied to
Counting arguments provide no evidence for AI doom
by
Quintin Pope
3mo
ago
•
Applied to
The Inner Alignment Problem
by
Jakub Halmeš
3mo
ago
•
Applied to
Satisficers want to become maximisers
by
JenniferRM
9mo
ago
•
Applied to
Mesa-Optimization: Explain it like I'm 10 Edition
by
brook
9mo
ago
•
Applied to
Runaway Optimizers in Mind Space
by
silentbob
10mo
ago
•
Applied to
Disincentivizing deception in mesa optimizers with Model Tampering
by
martinkunev
11mo
ago
•
Applied to
Challenge proposal: smallest possible self-hardening backdoor for RLHF
by
Christopher King
11mo
ago
•
Applied to
Simple experiments with deceptive alignment
by
Andreas_Moe
1y
ago
•
Applied to
Consequentialism is in the Stars not Ourselves
by
Cinera Verinia
1y
ago