This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
200 Concrete Open Problems in Mechanistic Interpretability
AF
Login
200 Concrete Open Problems in Mechanistic Interpretability
17
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda
2y
5
39
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda
2y
0
18
200 COP in MI: The Case for Analysing Toy Language Models
Neel Nanda
2y
2
8
200 COP in MI: Looking for Circuits in the Wild
Neel Nanda
2y
3
17
200 COP in MI: Interpreting Algorithmic Problems
Neel Nanda
2y
0
18
200 COP in MI: Exploring Polysemanticity and Superposition
Neel Nanda
2y
1
11
200 COP in MI: Analysing Training Dynamics
Neel Nanda
2y
0
7
200 COP in MI: Techniques, Tooling and Automation
Neel Nanda
2y
0
10
200 COP in MI: Image Model Interpretability
Neel Nanda
2y
1
10
200 COP in MI: Interpreting Reinforcement Learning
Neel Nanda
2y
0
11
200 COP in MI: Studying Learned Features in Language Models
Neel Nanda
2y
2