AI ALIGNMENT FORUM
AF

Simon Lermen

Twitter: @SimonLermenAI

Posts

Sorted by New

52unRLHF - Efficiently undoing LLM safeguards

1y

2

51LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B

1y

0

14Robustness of Model-Graded Evaluations and Automated Interpretability

2y

2

Wikitag Contributions

Comments

Sorted by