This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Sparse Autoencoders (SAEs)
•
Applied to
Exploring SAE features in LLMs with definition trees and token lists
by
mwatkins
1d
ago
•
Applied to
Interpretability of SAE Features Representing Check in ChessGPT
by
Jonathan Kutasov
2d
ago
•
Applied to
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
by
Raymond Arnold
7d
ago
•
Applied to
LLMs are likely not conscious
by
research_prime_space
7d
ago
•
Applied to
Toy Models of Superposition: Simplified by Hand
by
Axel Sorensen
7d
ago
•
Applied to
[Linkpost] Play with SAEs on Llama 3
by
Raymond Arnold
11d
ago
•
Applied to
Evaluating Synthetic Activations composed of SAE Latents in GPT-2
by
Giorgi Giglemiani
11d
ago
•
Applied to
[Paper] A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
by
Joseph Isaac Bloom
12d
ago
•
Applied to
Food, Prison & Exotic Animals: Sparse Autoencoders Detect 6.5x Performing Youtube Thumbnails
by
Louka Ewington-Pitsos
20d
ago
•
Applied to
[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
by
Fernando Avalos
1mo
ago
•
Applied to
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
by
Daniel Lee
1mo
ago
•
Applied to
Massive Activations and why <bos> is important in Tokenized SAE Unigrams
by
Louka Ewington-Pitsos
1mo
ago
•
Applied to
A gentle introduction to sparse autoencoders
by
Raymond Arnold
1mo
ago
•
Applied to
Training a Sparse Autoencoder in < 30 minutes on 16GB of VRAM using an S3 cache
by
Louka Ewington-Pitsos
1mo
ago
•
Applied to
Showing SAE Latents Are Not Atomic Using Meta-SAEs
by
Bart Bussmann
1mo
ago
•
Applied to
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
by
Kola Ayonrinde
1mo
ago
•
Applied to
Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
by
Patrick Leask
2mo
ago
•
Applied to
Extracting SAE task features for in-context learning
by
Dmitrii Kharlapenko
2mo
ago
•
Applied to
Self-explaining SAE features
by
RobertM
2mo
ago