This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Power Seeking (AI)
•
Applied to
Steering Llama-2 with contrastive activation additions
by
Alex Turner
5mo
ago
•
Applied to
Natural Abstraction: Convergent Preferences Over Information Structures
by
paulom
7mo
ago
•
Applied to
You can't fetch the coffee if you're dead: an AI dilemma
by
hennyge
9mo
ago
•
Applied to
The Game of Dominance
by
Karl von Wendt
9mo
ago
•
Applied to
Incentives from a causal perspective
by
Tom Everitt
10mo
ago
•
Applied to
Instrumental Convergence? [Draft]
by
Dan H
1y
ago
•
Applied to
Categorical-measure-theoretic approach to optimal policies tending to seek power
by
Victoria Krakovna
1y
ago
•
Applied to
My Overview of the AI Alignment Landscape: Threat Models
by
Michelle Viotti
1y
ago
•
Applied to
Ideas for studies on AGI risk
by
dr_s
1y
ago
•
Applied to
Instrumental convergence in single-agent systems
by
Jacob Pfau
1y
ago
•
Applied to
Risks from GPT-4 Byproduct of Recursively Optimizing AIs
by
Ben Hayum
1y
ago
•
Applied to
[Linkpost] Shorter version of report on existential risk from power-seeking AI
by
Ruben Bloom
1y
ago
•
Applied to
The Waluigi Effect (mega-post)
by
Cleo Nardo
1y
ago
•
Applied to
Power-seeking can be probable and predictive for trained agents
by
Victoria Krakovna
1y
ago
•
Applied to
Power-Seeking = Minimising free energy
by
Jonas Hallgren
1y
ago