AI ALIGNMENT FORUM
200 Concrete Open Problems in Mechanistic Interpretability
AF

200 Concrete Open Problems in Mechanistic Interpretability

Dec 28, 2022 by Neel Nanda

17Concrete Steps to Get Started in Transformer Mechanistic Interpretability

2y

5

39200 Concrete Open Problems in Mechanistic Interpretability: Introduction

2y

0

18200 COP in MI: The Case for Analysing Toy Language Models

2y

2

8200 COP in MI: Looking for Circuits in the Wild

2y

3

17200 COP in MI: Interpreting Algorithmic Problems

2y

0

18200 COP in MI: Exploring Polysemanticity and Superposition

2y

1

11200 COP in MI: Analysing Training Dynamics

2y

0

7200 COP in MI: Techniques, Tooling and Automation

2y

0

10200 COP in MI: Image Model Interpretability

2y

1

10200 COP in MI: Interpreting Reinforcement Learning

2y

0

11200 COP in MI: Studying Learned Features in Language Models

2y

2