Here are some example latents taken from the residual stream SAEs for Gemma V3 27B IT.
This release contains SAEs trained on 3 different sites (residual stream, MLP output and attention output) as well as MLP transcoders (both with and without affine skip connections), for every layer of each of the 10 models in the Gemma 3 family (i.e. sizes 270m, 1b, 4b, 12b and 27b, both the PT and IT versions of each). For every layer, we provide 4 models (widths 16k and 262k, and two different target L0 values). Rather than giving the exact L0s, we label them "small" (10-20), "medium" (30-60) and "big" (90-150).
Additionally, for 4 layers in each model (at depths 25%, 50%, 65%, 85%) we provide each of these single-layer models for a larger hyperparameter sweep over widths and L0 values, including residual stream SAEs with widths up to 1m for every model.
Lastly, we've also included several multi-layer models: CLTs on 270m & 1b, and weakly causal crosscoders trained on the concatenation of 4 layers (the same 4 depths mentioned above) for every base model size & type.
All models are JumpReLU, trained using a quadratic L0 penalty along with an additional frequency penalty which prevented the formation of high-frequency features. We also used a version of Matryoshka loss during training, which has been documented to help reduce the instance of feature absorption.
If you're interested in finding features connected to certain behavioural traits (to perform steering, or to better attribute certain model behaviours, or analyze directions you've found inside the model using supervised methods etc), we recommend using the residual stream models trained on a subset of the model layers (e.g. here). The 262k-width models with medium L0 values (in the 30-60 range) should prove suitable for most people, although the 16k and 65k widths may also prove useful. All the examples in the screenshots above were from 262k-width medium-L0 SAEs finetuned on Gemma V3 270m IT.
If you're interested in doing circuit-style analysis e.g. with attribution graphs, we recommend using the suite of transcoders we've trained on all layers of the model, e.g. here. Affine skip connections were strictly beneficial so we recommend using these. Models with larger width lead to richer analysis, but the computational cost of circuit-style work can grow very large especially for bigger base models, so you may wish to use 16k width rather than 262k. Neuronpedia will shortly be hosting an interactive page which allows you to generate and explore your own attribution graphs using these transcoders.
Here's all the relevant links to go along with this release:
The ARENA material will also be updated to use this new suite of models, in place of the models from the 2024 Gemma Scope release.