AI ALIGNMENT FORUM
AF

Charlie Griffin

Posts

Sorted by New

22Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

24d

0

20Games for AI Control

9mo

0

23Scenario Forecasting Workshop: Materials and Learnings

1y

1

9Five projects from AI Safety Hub Labs 2023

1y

0

48Goodhart's Law in Reinforcement Learning

2y

5

Wikitag Contributions

Comments

Sorted by