x

AI ALIGNMENT FORUM
AF

Connor Kissane — AI Alignment Forum

Connor Kissane

Ω203700

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No wikitag contributions to display.

No Comments Found

42White Box Control at UK AISI - Update on Sandbagging Investigations

5mo

5

34SAEs are highly dataset dependent: a case study on the refusal direction

1y

0

23Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

1y

3

36Base LLMs refuse too

1y

10

28SAEs (usually) Transfer Between Base and Chat Models

1y

0

18Attention Output SAEs Improve Circuit Analysis

1y

0

34We Inspected Every Head In GPT-2 Small using SAEs So You Don’t Have To

2y

0

28Attention SAEs Scale to GPT-2 Small

2y

0

35Sparse Autoencoders Work on Attention Layer Outputs

2y

3