This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Goal-Directedness
Settings
Applied to
Creating Complex Goals: A Model to Create Autonomous Agents
by
Raymond Arnold
1mo
ago
Applied to
ParaScopes: Do Language Models Plan the Upcoming Paragraph?
by
Nicky Pochinkov
2mo
ago
Dakara
v1.4.0
Dec 30th 2024 GMT
(
+12
/
-12
)
LW
1
Applied to
Locally optimal psychology
by
Chipmonk
5mo
ago
Applied to
Don't want Goodhart? — Specify the variables more
by
Yan
5mo
ago
Applied to
Don't want Goodhart? — Specify the damn variables
5mo
ago
Applied to
[Interim research report] Evaluating the Goal-Directedness of Language Models
by
Rauno Arike
9mo
ago
Applied to
A "Bitter Lesson" Approach to Aligning AGI and ASI
by
Roger Dearnaley
10mo
ago
Applied to
Emotional issues often have an immediate payoff
by
Chipmonk
10mo
ago
Applied to
Measuring Coherence and Goal-Directedness in RL Policies
by
Dylan Xu
1y
ago
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
1y
ago
Applied to
Measuring Coherence of Policies in Toy Environments
by
Dylan Xu
1y
ago
Applied to
Refinement of Active Inference agency ontology
by
Roman Leventov
1y
ago
Applied to
Quick thoughts on the implications of multi-agent views of mind on AI takeover
by
Kaj Sotala
1y
ago
Applied to
Towards an Ethics Calculator for Use by an AGI
by
Sean Sweeney
1y
ago
Applied to
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
by
RobertM
1y
ago