This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Transformers
•
Applied to
Analyzing how SAE features evolve across a forward pass
by
bensenberner
2mo
ago
•
Applied to
Transformers Explained (Again)
by
Rohan Subramani
2mo
ago
•
Applied to
Characterizing stable regions in the residual stream of LLMs
by
Jett Janiak
3mo
ago
•
Applied to
If I ask an LLM to think step by step, how big are the steps?
by
Ryan Blough
4mo
ago
•
Applied to
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
by
Diego Caples
4mo
ago
•
Applied to
Visualizing small Attention-only Transformers
by
Léo Dana
4mo
ago
•
Applied to
How Big a Deal are MatMul-Free Transformers?
by
JustisMills
6mo
ago
•
Applied to
Week One of Studying Transformers Architecture
by
JustisMills
6mo
ago
•
Applied to
Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability
by
ntt123
7mo
ago
•
Applied to
Exploring Llama-3-8B MLP Neurons
by
ntt123
7mo
ago
•
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
7mo
ago
•
Applied to
If language is for communication, what does that imply about LLMs?
by
Bill Benzon
8mo
ago
•
Applied to
An interesting mathematical model of how LLMs work
by
Bill Benzon
8mo
ago
•
Applied to
Transformers Represent Belief State Geometry in their Residual Stream
by
Adam Shai
9mo
ago
•
Applied to
Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?
by
right..enough?
9mo
ago
•
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
9mo
ago
•
Applied to
Decompiling Tracr Transformers - An interpretability experiment
by
Hannes Thurnherr
9mo
ago
•
Applied to
Modern Transformers are AGI, and Human-Level
by
jacobjacob
9mo
ago
•
Applied to
Deconfusing In-Context Learning
by
Arjun Panickssery
10mo
ago