AI ALIGNMENT FORUM
AF

Diogo de Lucena

Chief Scientist at AE Studio

Posts

Sorted by New

27Mistral Large 2 (123B) exhibits alignment faking

3d

0

31Reducing LLM deception at scale with self-other overlap fine-tuning

17d

9

20Self-prediction acts as an emergent regularizer

5mo

0

56Self-Other Overlap: A Neglected Approach to AI Alignment

8mo

7

Wikitag Contributions

Comments

Sorted by