Analyzing DeepMind's Probabilistic Methods for Evaluating Agent Capabilities — AI Alignment Forum