The Library

Curated Sequences

AGI safety from first principles
Embedded Agency
2022 MIRI Alignment Discussion
2021 MIRI Conversations
Infra-Bayesianism
Conditioning Predictive Models
Cyborgism
The Engineer’s Interpretability Sequence
Iterated Amplification
Value Learning
Risks from Learned Optimization
Cartesian Frames

Community Sequences

The AI Alignment and Deployment Problems
CAST: Corrigibility As Singular Target
AI Control
Formalising Catastrophic Goodhart
The Ethicophysics
Game Theory without Argmax
The Value Change Problem (sequence)
Monthly Algorithmic Problems in Mech Interp
An Opinionated Guide to Computability and Complexity
Developmental Interpretability
Catastrophic Risks From AI
Distilling Singular Learning Theory
Load More (12/75)