This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Interpretability (ML & AI)
•
Applied to
Effects of Non-Uniform Sparsity on Superposition in Toy Models
by
Shreyans Jain
6d
ago
•
Applied to
Antonym Heads Predict Semantic Opposites in Language Models
by
Jake Ward
7d
ago
•
Applied to
Analyzing how SAE features evolve across a forward pass
by
bensenberner
14d
ago
•
Applied to
SAEs are highly dataset dependent: a case study on the refusal direction
by
Connor Kissane
16d
ago
•
Applied to
Evolutionary prompt optimization for SAE feature visualization
by
neverix
18d
ago
•
Applied to
Testing "True" Language Understanding in LLMs: A Simple Proposal
by
MtryaSam
20d
ago
•
Applied to
Composition Circuits in Vision Transformers (Hypothesis)
by
phenomanon
20d
ago
•
Applied to
SAE Probing: What is it good for? Absolutely something!
by
Subhash Kantamneni
21d
ago
•
Applied to
Bridging the VLM and mech interp communities for multimodal interpretability
by
Sonia Joseph
25d
ago
•
Applied to
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
by
Connor Kissane
26d
ago
•
Applied to
Enabling New Applications with Today's Mechanistic Interpretability Toolkit
by
ananya_joshi
1mo
ago
•
Applied to
SAEs you can See: Applying Sparse Autoencoders to Clustering
by
Robert_AIZI
1mo
ago
•
Applied to
Monosemanticity & Quantization
by
Rahul Chand
1mo
ago
•
Applied to
A Rocket–Interpretability Analogy
by
plex
1mo
ago
•
Applied to
A short project on Mamba: grokking & interpretability
by
Alejandro Tlaie Boria
1mo
ago
•
Applied to
The Computational Complexity of Circuit Discovery for Inner Interpretability
by
Bogdan Ionut Cirstea
1mo
ago
•
Applied to
Circuits in Superposition: Compressing many small neural networks into one
by
Lucius Bushnaq
1mo
ago
•
Applied to
It's important to know when to stop: Mechanistic Exploration of Gemma 2 List Generation
by
Gerard Boxo
1mo
ago