Robustness of Model-Graded Evaluations and Automated Interpretability — AI Alignment Forum