This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Sparse Autoencoders (SAEs)
•
Applied to
Learning Multi-Level Features with Matryoshka SAEs
by
Bart Bussmann
7d
ago
•
Applied to
Compositionality and Ambiguity: Latent Co-occurrence and Interpretable Subspaces
by
Matthew A. Clarke
8d
ago
•
Applied to
Matryoshka Sparse Autoencoders
by
Noa Nabeshima
10d
ago
•
Applied to
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders
by
Can
15d
ago
•
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan Chen
21d
ago
•
Applied to
Are SAE features from the Base Model still meaningful to LLaVA?
by
Shan Chen
21d
ago
•
Applied to
Mechanistic Interpretability of Llama 3.2 with Sparse Autoencoders
by
PaulPauls
1mo
ago
•
Applied to
Analyzing how SAE features evolve across a forward pass
by
bensenberner
2mo
ago
•
Applied to
SAEs are highly dataset dependent: a case study on the refusal direction
by
Connor Kissane
2mo
ago
•
Applied to
Evolutionary prompt optimization for SAE feature visualization
by
neverix
2mo
ago
•
Applied to
SAE Probing: What is it good for? Absolutely something!
by
Subhash Kantamneni
2mo
ago
•
Applied to
A suite of Vision Sparse Autoencoders
by
Louka Ewington-Pitsos
2mo
ago
•
Applied to
SAEs you can See: Applying Sparse Autoencoders to Clustering
by
Robert_AIZI
2mo
ago
•
Applied to
On the Practical Applications of Interpretability
by
Ruben Bloom
2mo
ago
•
Applied to
It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
by
Gerard Boxo
2mo
ago
•
Applied to
Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
by
Kola Ayonrinde
2mo
ago
•
Applied to
SAE features for refusal and sycophancy steering vectors
by
neverix
2mo
ago
•
Applied to
HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix
by
Jaehyuk Lim
3mo
ago
•
Applied to
Domain-specific SAEs
by
jacob_drori
3mo
ago