Predicted corrigibility: pareto improvements — AI Alignment Forum