“embedded self-justification,” or something like that — AI Alignment Forum