AI ALIGNMENT FORUM
AF

Diogo de Lucena

Chief Scientist at AE Studio

Posts

Sorted by New

28Mistral Large 2 (123B) exhibits alignment faking

1mo

0

31Reducing LLM deception at scale with self-other overlap fine-tuning

1mo

9

20Self-prediction acts as an emergent regularizer

6mo

0

56Self-Other Overlap: A Neglected Approach to AI Alignment

9mo

7

Wikitag Contributions

Comments

Sorted by