Counterfactual do-what-I-mean — AI Alignment Forum