This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
lewis smith
Posts
Sorted by New
8
lewis smith's Shortform
7mo
0
44
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
6d
4
29
A Problem to Solve Before Building a Deception Detector
2mo
0
92
The ‘strong’ feature hypothesis could be wrong
8mo
0
39
Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40
[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36
[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0
Wikitag Contributions
Comments
Sorted by
Newest