AI ALIGNMENT FORUM
Wikitags
AF

Academic Papers

Settings

Applied to Text First, Evidence Later? Managing Quality and Trust in an Era of AI-Augmented Research by Thehumanproject.ai 15d ago

Applied to Habermas Machine by Nicholas Kees Dupuis 1mo ago

Applied to New AI safety treaty paper out! by otto.barten 1mo ago

Applied to Distillation of Meta's Large Concept Models Paper by Nicky Pochinkov 2mo ago

Applied to Shallow review of technical AI safety, 2024 by jordine 2mo ago

lesswrong-internal v1.2.0Feb 8th 2025 GMT Convert editor type to CkEditor LW1

Applied to Monet: Mixture of Monosemantic Experts for Transformers Explained by CalebMaresca 3mo ago

Applied to Paper club: He et al. on modular arithmetic (part I) by Dmitry Vaintrob 3mo ago

Applied to 'Chat with impactful research & evaluations' (Unjournal NotebookLMs) by david reinstein 7mo ago

Applied to Searching for Impossibility Results or No-Go Theorems for provable safety. by Maelstrom 7mo ago

Applied to Secret Collusion: Will We Know When to Unplug AI? by schroederdewitt 7mo ago

Applied to Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs by Owain Evans 10mo ago

Applied to How Big a Deal are MatMul-Free Transformers? by JustisMills 10mo ago

Applied to Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller by Henry Cai 10mo ago

Applied to Evidence of Learned Look-Ahead in a Chess-Playing Neural Network by Erik Jenner 11mo ago

Applied to Rawls's Veil of Ignorance Doesn't Make Any Sense by Arjun Panickssery 1y ago

Applied to Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search by Arjun Panickssery 1y ago

Applied to How to Control an LLM's Behavior (why my P(DOOM) went down) by Roger Dearnaley 1y ago

Applied to Striking Implications for Learning Theory, Interpretability — and Safety? by Roger Dearnaley 1y ago