Compact Proofs of Model Performance via Mechanistic Interpretability — AI Alignment Forum