wuthejeff

Posts

Sorted by New

Wiki Contributions

Comments

ProLU: A Nonlinearity for Sparse Autoencoders

This is great! We were working on very similar things concurrently at OpenAI but ended up going a slightly different route.

A few questions:
- What does the distribution of learned biases look like?
- For the STE variant, did you find it better to use the STE approximation for the activation gradient, even though the approximation is only needed for the bias?

Reply