EIS IX: Interpretability and Adversaries — AI Alignment Forum