This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Simon Lermen
Twitter: @SimonLermenAI
Posts
Sorted by New
52
unRLHF - Efficiently undoing LLM safeguards
1y
2
51
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
1y
0
14
Robustness of Model-Graded Evaluations and Automated Interpretability
1y
2
Wiki Contributions
Comments
Sorted by
Newest