AI ALIGNMENT FORUM
AF

Brendan Murphy

Posts

Sorted by New

10Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

9d

0

8GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

4mo

0

Wikitag Contributions

Comments

Sorted by