AI ALIGNMENT FORUM
Wikitags
AF

Goal-Directedness

Settings

Applied to Creating Complex Goals: A Model to Create Autonomous Agents by Raymond Arnold 1mo ago

Applied to ParaScopes: Do Language Models Plan the Upcoming Paragraph? by Nicky Pochinkov 2mo ago

Dakara v1.4.0Dec 30th 2024 GMT (+12/-12) LW1

Applied to Locally optimal psychology by Chipmonk 5mo ago

Applied to Don't want Goodhart? — Specify the variables more by Yan 5mo ago

Applied to Don't want Goodhart? — Specify the damn variables 5mo ago

Applied to [Interim research report] Evaluating the Goal-Directedness of Language Models by Rauno Arike 9mo ago

Applied to A "Bitter Lesson" Approach to Aligning AGI and ASI by Roger Dearnaley 10mo ago

Applied to Emotional issues often have an immediate payoff by Chipmonk 10mo ago

Applied to Measuring Coherence and Goal-Directedness in RL Policies by Dylan Xu 1y ago

Applied to Understanding mesa-optimization using toy models by tilmanr 1y ago

Applied to Measuring Coherence of Policies in Toy Environments by Dylan Xu 1y ago

Applied to Refinement of Active Inference agency ontology by Roman Leventov 1y ago

Applied to Quick thoughts on the implications of multi-agent views of mind on AI takeover by Kaj Sotala 1y ago

Applied to Towards an Ethics Calculator for Use by an AGI by Sean Sweeney 1y ago

Applied to “Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”) by RobertM 1y ago