AI ALIGNMENT FORUM
AF

Connor Kissane

Posts

Sorted by New

34SAEs are highly dataset dependent: a case study on the refusal direction

5mo

0

22Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

5mo

3

35Base LLMs refuse too

6mo

10

28SAEs (usually) Transfer Between Base and Chat Models

9mo

0

18Attention Output SAEs Improve Circuit Analysis

10mo

0

34We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To

1y

0

28Attention SAEs Scale to GPT-2 Small

1y

0

35Sparse Autoencoders Work on Attention Layer Outputs

1y

3

Wikitag Contributions

Comments

Sorted by