This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Miles Turpin
Research scientist at NYU Alignment Research Group
Posts
Sorted by New
41
Reward hacking behavior can generalize across tasks
9mo
1
13
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
1y
0
17
Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
1y
0
22
Unfaithful Explanations in Chain-of-Thought Prompting
2y
0
Wikitag Contributions
Comments
Sorted by
Newest