This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Miles Turpin
Research scientist at NYU Alignment Research Group
Posts
Sorted by New
41
Reward hacking behavior can generalize across tasks
7mo
1
13
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
9mo
0
17
Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
1y
0
22
Unfaithful Explanations in Chain-of-Thought Prompting
2y
0
Wiki Contributions
Comments
Sorted by
Newest