This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Scalable Oversight
•
Applied to
Ways to think about alignment
by
Abhimanyu Pallavi Sudhir
22d
ago
•
Applied to
Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets
by
Abhimanyu Pallavi Sudhir
2mo
ago
•
Applied to
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
by
DanielFilan
3mo
ago
•
Applied to
Inference-Only Debate Experiments Using Math Problems
by
Arjun Panickssery
3mo
ago
•
Applied to
Scalable oversight as a quantitative rather than qualitative problem
by
Ruben Bloom
4mo
ago
•
Applied to
On scalable oversight with weak LLMs judging strong LLMs
by
Zachary Kenton
4mo
ago
•
Applied to
NYU Code Debates Update/Postmortem
by
David Rein
6mo
ago
•
Applied to
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
by
Raymond Arnold
7mo
ago
•
Created by
Raymond Arnold
at
7mo