This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Subscribe
Discussion
0
1
Transformer Circuits
Subscribe
Discussion
0
1
This page is a stub.
Posts tagged
Transformer Circuits
Most Relevant
1
19
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
,
Neel Nanda
2y
1
2
53
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda
8mo
10
1
39
200 Concrete Open Problems in Mechanistic Interpretability: Introduction
Neel Nanda
2y
0
1
27
Finding Sparse Linear Connections between Features in LLMs
Logan Riggs Smith
,
Sam Mitchell
,
Adam Kaufman
1y
2
2
23
How to Think About Activation Patching
Neel Nanda
2y
3
2
24
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Neel Nanda
,
Tom Lieberum
,
Matthew Rahtz
,
János Kramár
,
Geoffrey Irving
,
Rohin Shah
,
Vladimir Mikulik
2y
0
2
16
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Neel Nanda
2y
1
1
18
200 COP in MI: Exploring Polysemanticity and Superposition
Neel Nanda
2y
1
1
17
200 COP in MI: Interpreting Algorithmic Problems
Neel Nanda
2y
0
1
16
A Walkthrough of Interpretability in the Wild (w/ authors Kevin Wang, Arthur Conmy & Alexandre Variengien)
Neel Nanda
2y
15
1
11
A Walkthrough of In-Context Learning and Induction Heads (w/ Charles Frye) Part 1 of 2
Neel Nanda
2y
0
1
11
200 COP in MI: Analysing Training Dynamics
Neel Nanda
2y
0
1
8
200 COP in MI: Looking for Circuits in the Wild
Neel Nanda
2y
3
1
7
200 COP in MI: Techniques, Tooling and Automation
Neel Nanda
2y
0
1
35
Sparse Autoencoders Work on Attention Layer Outputs
Connor Kissane
,
Robert Krzyzanowski
,
Arthur Conmy
,
Neel Nanda
1y
3