AI ALIGNMENT FORUM
AF

HomeLibraryQuestionsAll Posts

Top Questions

Recent Activity

52Have LLMs Generated Novel Insights?

abramdemski, Cole Wyeth, Kaj_Sotala

6mo

19

43why assume AGIs will optimize for fixed goals?

nostalgebraist, Rob Bensinger

3y

3

27What convincing warning shot could help prevent extinction from AI?

Charbel-Raphaël, cozyfractal, peterbarnett

1y

2

40Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout, johnswentworth

3y

13

69Why is o1 so deceptive?

abramdemski, Sahil

1y

14

Load MoreView All Top Questions

52Have LLMs Generated Novel Insights?

abramdemski, Cole Wyeth, Kaj_Sotala

6mo

19

43why assume AGIs will optimize for fixed goals?

nostalgebraist, Rob Bensinger

3y

3

27What convincing warning shot could help prevent extinction from AI?

Charbel-Raphaël, cozyfractal, peterbarnett

1y

2

7Egan's Theorem?

5y

7

40Seriously, what goes wrong with "reward the agent when it makes you smile"?

TurnTrout, johnswentworth

3y

13

14Is weak-to-strong generalization an alignment technique?

7mo

1

9What is the most impressive game LLMs can play well?

8mo

8

4How counterfactual are logical counterfactuals?

9mo

9

16Are You More Real If You're Really Forgetful?

Thane Ruthenis, Charlie Steiner

9mo

4

6Why not tool AI?

smithee, Ben Pace

7y

2

69Why is o1 so deceptive?

abramdemski, Sahil

1y

14

7Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger (formerly: capybaralet)

1y

5

Load MoreView All Questions