A paper investigating how individual neurons in a CLIP model (an image/text neural net combining a ResNet vision model with a Transformer language model) respond to various abstract concepts. This shouldn't be very surprising after GPT-3 and DALL-E but still, identifying multimodal neurons feels scarily close to "neural net that understands abstract concepts" and thus AGI for my comfort.
Some individual neurons that they isolated (see the article for more):
Spiderman neuron: responds to photos of Spiderman in costume and spiders, comics or drawings of Spiderman and spider-themed icons, the text “spider” and others. Associates him with "Peter Parker" and also responds to images, text, and drawings of heroes and villains from Spiderman movies and comics over the last half-century.
Yellow neuron: responds to images of the words “yellow”, “banana” and “lemon,” in addition to the color.
Jesus Christ neuron: detects Christian symbols like crosses and crowns of thorns, paintings of Jesus, his written name, and feature visualization shows him as a baby in the arms of the Virgin Mary.
Hitler neuron: learns to detect his face and body, symbols of the Nazi party, relevant historical documents, and other loosely related concepts like German food. Feature visualization shows swastikas and Hitler seemingly doing a Nazi salute.
Donald Trump neuron: strongly responds to images of him across a wide variety of settings, including effigies and caricatures in many artistic mediums, as well as more weakly activating for people he’s worked closely with like Mike Pence and Steve Bannon. It also responds to his political symbols and messaging (eg. “The Wall” and “Make America Great Again” hats). On the other hand, it most *negatively* activates to musicians like Nicky Minaj and Eminem, video games like Fortnite, civil rights activists like Martin Luther King Jr., and LGBT symbols like rainbow flags.
Happiness neuron: responds both to images of similing people, and words like “joy.”
Surprise neuron: responds to images of surprised people, and to slang like "OMG!" and "WTF", and text feature visualization produces similar words of shock and surprise.
Mental illness neuron: activates when images contain words associated with negative mental states (eg. “depression,” “anxiety,” “lonely,” “stressed”), words associated with clinical mental health treatment (“psychology”, “mental,” “disorder”, “therapy”) or mental health pejoratives (“insane,” “psycho”). It also fires more weakly for images of drugs, and for facial expressions that look sad or stressed, and for the names of negative emotions.
Northern Hemisphere neuron: responds to bears, moose, coniferous forest, and the entire Northern third of a world map.
East Africa neuron: fires most strongly for flags, country names, and other strong national associations, more weakly for ethnicity.
They also document the fact that the multimodality allows for "typographic attacks", where labeling an item with a particular text causes the network to misclassify the item as an instance of the text.
A paper investigating how individual neurons in a CLIP model (an image/text neural net combining a ResNet vision model with a Transformer language model) respond to various abstract concepts. This shouldn't be very surprising after GPT-3 and DALL-E but still, identifying multimodal neurons feels scarily close to "neural net that understands abstract concepts" and thus AGI for my comfort.
Some individual neurons that they isolated (see the article for more):
They also document the fact that the multimodality allows for "typographic attacks", where labeling an item with a particular text causes the network to misclassify the item as an instance of the text.