Thanks for running these experiments and writing this up! I’m very excited to see this sort of followup work, and I think there are a lot of useful results here. I agree with most of this, and mostly just have a few nitpicks about how you interpret some things.
Reactions to the summary of your experimental results:
CCS does so better than random, but not by a huge margin: on average, random linear probes have a 75% accuracy on some “easy” datasets;
I think it’s cool that random directions sometimes do so well; this provides a bit of additional evide
There were a number of iterations with major tweaks. It went something like:
I spent a while thinking about the problem conceptually, and developed a pretty strong intuition that something like this should be possible.
I tried to show it experimentally. There were no signs of life for a while (it turns out you need to get a bunch of details right to see any real signal -- a regime that I think is likely my comparative advantage) but I eventually got it to sometimes work using a PCA-based method. I think it took some work to make that more reliable, whi
Thanks for running these experiments and writing this up! I’m very excited to see this sort of followup work, and I think there are a lot of useful results here. I agree with most of this, and mostly just have a few nitpicks about how you interpret some things.
Reactions to the summary of your experimental results:
- CCS does so better than random, but not by a huge margin: on average, random linear probes have a 75% accuracy on some “easy” datasets;
- I think it’s cool that random directions sometimes do so well; this provides a bit of additional evide
... (read more)