High-level interpretability: detecting an AI's objectives — AI Alignment Forum