This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Mikita Balesni
Posts
Sorted by New
18
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
11d
0
70
Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
1mo
1
84
Frontier Models are Capable of In-context Scheming
5mo
9
29
Toward Safety Cases For AI Scheming
6mo
0
42
Apollo Research 1-year update
11mo
0
14
How I select alignment research projects
1y
4
26
A starter guide for evals
1y
0
33
Understanding strategic deception and deceptive alignment
2y
0
56
Paper: LLMs trained on “A is B” fail to learn “B is A”
2y
0
44
Paper: On measuring situational awareness in LLMs
2y
13
Wikitag Contributions
Comments
Sorted by
Newest