Thanks for this write-up! In case it’s of interest, we have also performed some exploratory interpretability work using the SVD of model weights.
We examine convolutional layers in models on a couple common vision tasks (CIFAR-10, ImageNet). In short, we similarly take the SVD of the weights in a CNN layer, WL=USVT, and project the hidden layer activations xl onto the ith singular vector V[i,:]xl. These singular direction “neurons” can then be studied with interpretability methods: we use hypergraphs, feature visualizati... (read more)
Thanks for this write-up! In case it’s of interest, we have also performed some exploratory interpretability work using the SVD of model weights.
We examine convolutional layers in models on a couple common vision tasks (CIFAR-10, ImageNet). In short, we similarly take the SVD of the weights in a CNN layer, WL=USVT, and project the hidden layer activations xl onto the ith singular vector V[i,:]xl. These singular direction “neurons” can then be studied with interpretability methods: we use hypergraphs, feature visualizati... (read more)