AI ALIGNMENT FORUM
Tags
AF

Deconfusion

Settings

•

Applied to Deceptive Alignment and Homuncularity by Alex Turner 6mo ago

•

Applied to 1. A Sense of Fairness: Deconfusing Ethics by Roger Dearnaley 1y ago

•

Applied to Interpreting the Learning of Deceit by Roger Dearnaley 1y ago

•

Applied to Reality and reality-boxes by Mo Putera 1y ago

•

Applied to My Central Alignment Priority (2 July 2023) by Nicholas Kross 2y ago

•

Applied to My research agenda in agent foundations by Alex_Altair 2y ago

•

Applied to Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’ by lukemarks 2y ago

•

Applied to Reward is the optimization target (of capabilities researchers) by Max H 2y ago

•

Applied to How should we think about the decision relevance of models estimating p(doom)? by Mo Putera 2y ago

•

Applied to Deconfusing Direct vs Amortised Optimization by Cinera Verinia 2y ago

•

Applied to Trying to isolate objectives: approaches toward high-level interpretability by Arun Jose 2y ago

•

Applied to Reward is not the optimization target by Euterpe 2y ago

•

Applied to Builder/Breaker for Deconfusion by Raymond Arnold 2y ago

•

Applied to Why Do AI researchers Rate the Probability of Doom So Low? by Aorou 2y ago

•

Applied to Simulators by janus 2y ago

•

Applied to My summary of the alignment problem by Peter Hroššo 2y ago

•

Applied to Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios by Evan R. Murphy 3y ago