This is a special post for quick takes by Vivek Hebbar. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
I think it’s possible that an AI will decide not to sandbag (e.g. on alignment research tasks), even if all of the following are true:
The reason is as follows:
A framing I wrote up for a debate about "alignment tax":
A person whose mainline is {1a --> 1b --> 2b or 2c} might say "alignment is unsolved, solving it mostly a discrete thing, and alignment taxes and multipolar incentives aren't central"
Whereas someone who thinks we're already in 2a might say "alignment isn't hard, the problem is incentives and competitiveness"
Someone whose mainline is {1a --> 2a} might say "We need to both 'solve alignment at all' AND either get the tax to be really low or do coordination. Both are hard, and both are necessary."