On scalable oversight with weak LLMs judging strong LLMs — AI Alignment Forum