x
[Paper] Does Self-Evaluation Enable Wireheading in Language Models? — AI Alignment Forum