Harmless reward hacks can generalize to misalignment in LLMs — AI Alignment Forum