AI ALIGNMENT FORUM
AF

EducationInterpretability (ML & AI)Transformer CircuitsAI
Frontpage

16

Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy

by Neel Nanda
29th Aug 2023
1 min read
1

16

This is a linkpost for https://www.youtube.com/watch?v=dn4GqR0DCx8&list=PL7m7hLIqA0hogxAaYtzlNolYAMr65NY45&index=1
EducationInterpretability (ML & AI)Transformer CircuitsAI
Frontpage
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
2Charlie Steiner
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 11:50 AM
[-]Charlie Steiner2y20

Awesome, thanks for all of these videos.

Reply
Moderation Log
More from Neel Nanda
View more
Curated and popular this week
1Comments

Arthur Conmy's Automated Circuit Discovery is a great paper that makes initial forays into automating parts of mechanistic interpretability (specifically, automatically finding a sparse subgraph for a circuit). In this three part series of Youtube videos, I interview him about the paper, and we walk through it and discuss the key results and takeaways. We discuss the high-level point of the paper and what researchers should takeaway from it, the ACDC algorithm and its key nuances, existing baselines and how they adapted them to be relevant to circuit discovery, how well the algorithm works, and how you can even evaluate how well an interpretability method works.