This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Academic Papers
Settings
Applied to
Text First, Evidence Later? Managing Quality and Trust in an Era of AI-Augmented Research
by
Thehumanproject.ai
15d
ago
Applied to
Habermas Machine
by
Nicholas Kees Dupuis
1mo
ago
Applied to
New AI safety treaty paper out!
by
otto.barten
1mo
ago
Applied to
Distillation of Meta's Large Concept Models Paper
by
Nicky Pochinkov
2mo
ago
Applied to
Shallow review of technical AI safety, 2024
by
jordine
2mo
ago
lesswrong-internal
v1.2.0
Feb 8th 2025 GMT
Convert editor type to CkEditor
LW
1
Applied to
Monet: Mixture of Monosemantic Experts for Transformers Explained
by
CalebMaresca
3mo
ago
Applied to
Paper club: He et al. on modular arithmetic (part I)
by
Dmitry Vaintrob
3mo
ago
Applied to
'Chat with impactful research & evaluations' (Unjournal NotebookLMs)
by
david reinstein
7mo
ago
Applied to
Searching for Impossibility Results or No-Go Theorems for provable safety.
by
Maelstrom
7mo
ago
Applied to
Secret Collusion: Will We Know When to Unplug AI?
by
schroederdewitt
7mo
ago
Applied to
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
by
Owain Evans
10mo
ago
Applied to
How Big a Deal are MatMul-Free Transformers?
by
JustisMills
10mo
ago
Applied to
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
by
Henry Cai
10mo
ago
Applied to
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
by
Erik Jenner
11mo
ago
Applied to
Rawls's Veil of Ignorance Doesn't Make Any Sense
by
Arjun Panickssery
1y
ago
Applied to
Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search
by
Arjun Panickssery
1y
ago
Applied to
How to Control an LLM's Behavior (why my P(DOOM) went down)
by
Roger Dearnaley
1y
ago
Applied to
Striking Implications for Learning Theory, Interpretability — and Safety?
by
Roger Dearnaley
1y
ago