AI ALIGNMENT FORUM
Tags
AF

AI Capabilities

•

Applied to Playing Dixit with AI: Can AI Systems Identify Misalignments in My Personalized Statements? by Mariia Koroliuk 8d ago

•

Applied to o3, Oh My by Tobias D. 9d ago

•

Applied to INTELLECT-1 Release: The First Globally Trained 10B Parameter Model by Matrice Jacobine 1mo ago

•

Applied to Have we seen any "ReLU instead of sigmoid-type improvements" recently by KvmanThinking 2mo ago

•

Applied to A short project on Mamba: grokking & interpretability by Alejandro Tlaie Boria 3mo ago

•

Applied to The case for a negative alignment tax by Cameron Berg 4mo ago

•

Applied to On agentic generalist models: we're essentially using existing technology the weakest and worst way you can use it by Yuli_Ban 4mo ago

•

Applied to Molecular dynamics data will be essential for the next generation of ML protein models by Lauren (often wrong) 4mo ago

•

Applied to Lifelogging for Alignment & Immortality by Roland 5mo ago

•

Applied to Diffusion Guided NLP: better steering, mostly a good thing by Nathan Helm-Burger 5mo ago

•

Applied to "AI achieves silver-medal standard solving International Mathematical Olympiad problems" by Lauren (often wrong) 5mo ago

•

Applied to How bad would AI progress need to be for us to think general technological progress is also bad? by Jim Buhler 6mo ago

•

Applied to [Paper] Stress-testing capability elicitation with password-locked models by Lauren (often wrong) 7mo ago

•

Applied to Memorizing weak examples can elicit strong behavior out of password-locked models by Lauren (often wrong) 7mo ago

•

Applied to The thing I don't understand about AGI by Jeremy Kalfus 7mo ago

•

Applied to Getting 50% (SoTA) on ARC-AGI with GPT-4o by Ryan Greenblatt 7mo ago

•

Applied to What’s the future of AI hardware? by Itay Dreyfus 7mo ago

•

Applied to [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations by Teun van der Weij 7mo ago

•

Applied to An Introduction to AI Sandbagging by Teun van der Weij 8mo ago