Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals? — AI Alignment Forum