User Comment Replies — AI Alignment Forum

Do you have any plans of doing something similar for attention layers?

I'm pretty sure that there's at least one other MATS group (unrelated to us) currently working on this, although I'm not certain about any of the details. Hopefully they release their research soon!

Also, do you have any plans to train sparse MLP at multiple layers in parallel, and try to penalise them to have sparsely activating connections between each other in addition to having sparse activations?

I did try something similar at one point, but it didn't quite work out. In particular: gi... (read more)

Transcoders enable fine-grained interpretable circuit analysis for language models

Jacob Dunefsky11mo20

1Lee Sharkey11mo

There's recent work published on this here by Chris Mathwin, Dennis Akar, and me. The gated attention block is a kind of transcoder adapted for attention blocks. Nice work by the way! I think this is a promising direction. Note also the similar, but substantially different, use of the term transcoder here, whose problems were pointed out to me by Lucius. Addressing those problems helped to motivate our interest in the kind of transcoders that you've trained in your work!

AI ALIGNMENT FORUM
AF

All of Jacob Dunefsky's Comments + Replies