Towards Deconfusing Gradient Hacking — AI Alignment Forum