This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Scalable Oversight
•
Applied to
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
by
Alex Turner
12d
ago
•
Applied to
Automated monitoring systems
by
Hikaru Tsujimura
20d
ago
•
Applied to
Ways to think about alignment
by
Abhimanyu Pallavi Sudhir
2mo
ago
•
Applied to
Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
by
Abhimanyu Pallavi Sudhir
3mo
ago
•
Applied to
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
by
DanielFilan
4mo
ago
•
Applied to
Inference-Only Debate Experiments Using Math Problems
by
Arjun Panickssery
4mo
ago
•
Applied to
Scalable oversight as a quantitative rather than qualitative problem
by
Ruben Bloom
5mo
ago
•
Applied to
On scalable oversight with weak LLMs judging strong LLMs
by
Zachary Kenton
5mo
ago
•
Applied to
NYU Code Debates Update/Postmortem
by
David Rein
7mo
ago
•
Applied to
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
by
Raymond Arnold
8mo
ago
•
Created by
Raymond Arnold
at
8mo