Interpreting Preference Models w/ Sparse Autoencoders — AI Alignment Forum