This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Meg
Posts
Sorted by New
118
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
10mo
69
49
Steering Llama-2 with contrastive activation additions
11mo
23
39
Towards Understanding Sycophancy in Language Models
1y
0
57
Paper: LLMs trained on “A is B” fail to learn “B is A”
1y
0
44
Paper: On measuring situational awareness in LLMs
1y
13
Wiki Contributions
Comments
Sorted by
Newest