AI ALIGNMENT FORUM
AF

Kellin Pelrine

Posts

Sorted by New

10Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

2mo

0

8GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

5mo

0

35Even Superhuman Go AIs Have Surprising Failure Modes

2y

9

Wikitag Contributions

Comments

Sorted by