AI ALIGNMENT FORUM
Tags
AF

Transformers

•

Applied to Analyzing how SAE features evolve across a forward pass by bensenberner 2mo ago

•

Applied to Transformers Explained (Again) by Rohan Subramani 2mo ago

•

Applied to Characterizing stable regions in the residual stream of LLMs by Jett Janiak 3mo ago

•

Applied to If I ask an LLM to think step by step, how big are the steps? by Ryan Blough 4mo ago

•

Applied to Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream by Diego Caples 4mo ago

•

Applied to Visualizing small Attention-only Transformers by Léo Dana 4mo ago

•

Applied to How Big a Deal are MatMul-Free Transformers? by JustisMills 6mo ago

•

Applied to Week One of Studying Transformers Architecture by JustisMills 6mo ago

•

Applied to Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability by ntt123 7mo ago

•

Applied to Exploring Llama-3-8B MLP Neurons by ntt123 7mo ago

•

Applied to Finding Backward Chaining Circuits in Transformers Trained on Tree Search by abhayesian 7mo ago

•

Applied to If language is for communication, what does that imply about LLMs? by Bill Benzon 8mo ago

•

Applied to An interesting mathematical model of how LLMs work by Bill Benzon 8mo ago

•

Applied to Transformers Represent Belief State Geometry in their Residual Stream by Adam Shai 9mo ago

•

Applied to Barcoding LLM Training Data Subsets. Anyone trying this for interpretability? by right..enough? 9mo ago

•

Applied to Understanding mesa-optimization using toy models by tilmanr 9mo ago

•

Applied to Decompiling Tracr Transformers - An interpretability experiment by Hannes Thurnherr 9mo ago

•

Applied to Modern Transformers are AGI, and Human-Level by jacobjacob 9mo ago

•

Applied to Deconfusing In-Context Learning by Arjun Panickssery 10mo ago