This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Home
Library
Questions
All Posts
About
Home
Library
Questions
All Posts
Top Questions
42
why assume AGIs will optimize for fixed goals?
Q
nostalgebraist
,
Rob Bensinger
3y
Q
3
51
Have LLMs Generated Novel Insights?
Q
Abram Demski
,
Cole Wyeth
,
Kaj Sotala
15d
Q
19
27
What convincing warning shot could help prevent extinction from AI?
Q
Charbel-Raphael Segerie
,
Diego Dorn
,
Peter Barnett
11mo
Q
2
40
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner
,
johnswentworth
3y
Q
13
69
Why is o1 so deceptive?
Q
Abram Demski
,
Sahil
5mo
Q
14
Recent Activity
42
why assume AGIs will optimize for fixed goals?
Q
nostalgebraist
,
Rob Bensinger
3y
Q
3
51
Have LLMs Generated Novel Insights?
Q
Abram Demski
,
Cole Wyeth
,
Kaj Sotala
15d
Q
19
27
What convincing warning shot could help prevent extinction from AI?
Q
Charbel-Raphael Segerie
,
Diego Dorn
,
Peter Barnett
11mo
Q
2
40
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner
,
johnswentworth
3y
Q
13
14
Is weak-to-strong generalization an alignment technique?
Q
cloud
1mo
Q
1
9
What is the most impressive game LLMs can play well?
Q
Cole Wyeth
2mo
Q
8
4
How counterfactual are logical counterfactuals?
Q
Donald Hobson
3mo
Q
9
16
Are You More Real If You're Really Forgetful?
Q
Thane Ruthenis
,
Charlie Steiner
3mo
Q
4
6
Why not tool AI?
Q
smithee
,
Ben Pace
6y
Q
2
69
Why is o1 so deceptive?
Q
Abram Demski
,
Sahil
5mo
Q
14
7
Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
Q
David Scott Krueger
6mo
Q
5
22
What progress have we made on automated auditing?
Q
Lawrence Chan
8mo
Q
0