Thanks Neel, keep this coming - even if only once every few years :) You helped me clarify lots of confusion I had about the existing techniques.
I am a huge fan of steering vectors / control vectors, and I would love to see future research showing if they can be linearly combined together to achieve multiple behaviours simultaneously (I made a post about this). I don't think it's just "internal work" - I think it's a hint to the fact that language semantics can be linearised as vector spaces (I hope I will be able to formalise mathematically this intuition... (read more)
Glad you liked the post!
I'm also pretty interested in combining steering vectors. I think a particularly promising direction is using SAE decoder vectors for this, as SAEs are designed to find feature vectors that independently vary and can be added.
I agree steering vectors are important as evidence for the linear representation hypothesis (though at this point I consider SAEs to be much superior as evidence, and think they're more interesting to focus on)
Thanks Neel, keep this coming - even if only once every few years :) You helped me clarify lots of confusion I had about the existing techniques.
I am a huge fan of steering vectors / control vectors, and I would love to see future research showing if they can be linearly combined together to achieve multiple behaviours simultaneously (I made a post about this). I don't think it's just "internal work" - I think it's a hint to the fact that language semantics can be linearised as vector spaces (I hope I will be able to formalise mathematically this intuition... (read more)