AI ALIGNMENT FORUM
Tags
AF

Scalable Oversight

•

Applied to Gradient Routing: Masking Gradients to Localize Computation in Neural Networks by Alex Turner 12d ago

•

Applied to Automated monitoring systems by Hikaru Tsujimura 20d ago

•

Applied to Ways to think about alignment by Abhimanyu Pallavi Sudhir 2mo ago

•

Applied to Reinforcement Learning from Information Bazaar Feedback, and other uses of information markets by Abhimanyu Pallavi Sudhir 3mo ago

•

Applied to AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization by DanielFilan 4mo ago

•

Applied to Inference-Only Debate Experiments Using Math Problems by Arjun Panickssery 4mo ago

•

Applied to Scalable oversight as a quantitative rather than qualitative problem by Ruben Bloom 5mo ago

•

Applied to On scalable oversight with weak LLMs judging strong LLMs by Zachary Kenton 5mo ago

•

Applied to NYU Code Debates Update/Postmortem by David Rein 7mo ago

•

Applied to Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight by Raymond Arnold 8mo ago

•

Created by Raymond Arnold at 8mo