Why almost every RL agent does learned optimization — AI Alignment Forum