x
Reward splintering as reverse of interpretability — AI Alignment Forum