All of Mikita Balesni's Comments + Replies

I think one practical difference is whether filtering pre-training data to exclude cases of scheming is a useful intervention.

(thx to Bronson for privately pointing this out)

I think directionally, removing parts of the training data would probably make a difference. But potentially less than we might naively assume, e.g. see Evan's argument on the AXRP podcast.

Also, I think you're right, and my statement of "I think for most practical considerations, it makes almost zero difference." was too strong.