This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Subscribe
Discussion
0
Sandbagging (AI)
Raymond Arnold
Sandbagging (AI)
Subscribe
Discussion
0
Written by
Raymond Arnold
last updated
27th Mar 2025
Summaries
Cancel
Submit
Sandbagging
is when an AI system pretends to be less capable during training/evaluation.
Posts tagged
Sandbagging (AI)
Most Relevant
1
29
The “no sandbagging on checkable tasks” hypothesis
Joe Carlsmith
2y
13
1
23
Automated Researchers Can Subtly Sandbag
Johannes Gasteiger
,
Akbir Khan
,
Sam Bowman
,
Vladimir Mikulik
,
Ethan Perez
,
Fabien Roger
6d
0