AI ALIGNMENT FORUM
AF

lewis smith

Posts

Sorted by New

8lewis smith's Shortform

7mo

0

44Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

6d

4

29A Problem to Solve Before Building a Deception Detector

2mo

0

92The ‘strong’ feature hypothesis could be wrong

8mo

0

39Improving Dictionary Learning with Gated Sparse Autoencoders

1y

32

40[Full Post] Progress Update #1 from the GDM Mech Interp Team

1y

3

36[Summary] Progress Update #1 from the GDM Mech Interp Team

1y

0

Wikitag Contributions

Comments

Sorted by