Hey Joseph (and coauthors),
Your directions are really fantastic. I hope you don't mind, but I generated the activation data for the first 3000+ directions for each of the 12 layers and uploaded your directions to Neuronpedia:
https://www.neuronpedia.org/gpt2-small/res-jb
Your directions are also linked on the home page and the model page.
They're also accessible by layer (sorted by top activation), eg layer 6: https://neuronpedia.org/gpt2-small/6-res-jb
I added the "Anthropic dashboard" to Neuronpedia for your dataset.
Explanations, comments, and autointe... (read more)
Hey Joseph (and coauthors),
Your directions are really fantastic. I hope you don't mind, but I generated the activation data for the first 3000+ directions for each of the 12 layers and uploaded your directions to Neuronpedia:
https://www.neuronpedia.org/gpt2-small/res-jb
Your directions are also linked on the home page and the model page.
They're also accessible by layer (sorted by top activation), eg layer 6: https://neuronpedia.org/gpt2-small/6-res-jb
I added the "Anthropic dashboard" to Neuronpedia for your dataset.
Explanations, comments, and autointe... (read more)