Ian Johnson

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

Are the datasets used to train the meta-SAEs the same as the datasets to train the original SAEs? If atomicity in a subdomain were a goal, would training a meta-SAE with a domain-specific dataset be interesting?

It seems like being able to target how atomic certain kinds of features are would be useful. Especially if you are focused on something, like identifying functionality/structure rather than knowledge. A specific example would be training on a large code dataset along with code QA. Would we find more atomic "bug features" like in scaling monosemanticity?