AI ALIGNMENT FORUM
AF

lewis smith

Posts

Sorted by New

8lewis smith's Shortform

8mo

0

59Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

5d

6

30A Problem to Solve Before Building a Deception Detector

2mo

1

98The ‘strong’ feature hypothesis could be wrong

8mo

0

39Improving Dictionary Learning with Gated Sparse Autoencoders

1y

32

40[Full Post] Progress Update #1 from the GDM Mech Interp Team

1y

3

36[Summary] Progress Update #1 from the GDM Mech Interp Team

1y

0

Wikitag Contributions

Comments

Sorted by