New safety research agenda: scalable agent alignment via reward modeling — AI Alignment Forum