AI ALIGNMENT FORUM
AF

janos

Posts

Sorted by New

22On scalable oversight with weak LLMs judging strong LLMs

9mo

18

28Power-seeking can be probable and predictive for trained agents

2y

20

Wikitag Contributions

Comments

Sorted by