Following up on our previous work on verbalized eval awareness:
we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.
We also share some quantitative analyses, qualitative examples, and upcoming work.
Following up on our previous work on verbalized eval awareness:
we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.
We also share some quantitative analyses, qualitative examples, and upcoming work.