interpreting GPT: the logit lens — AI Alignment Forum