This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Subscribe
Discussion
0
Scalable Oversight
Scalable Oversight
Subscribe
Discussion
0
Summaries
Cancel
Submit
This page is a stub.
Posts tagged
Scalable Oversight
Most Relevant
1
57
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
1y
7
1
43
Scalable oversight as a quantitative rather than qualitative problem
Buck Shlegeris
9mo
8
1
15
Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
,
JacksonKaunismaa
8mo
0
2
13
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
7mo
0
2
62
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evžen Wybitul
,
Joseph Miller
,
Alex Turner
4mo
3
1
22
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
,
Rohin Shah
9mo
18
0
17
Human-AI Complementarity: A Goal for Amplified Oversight
Rishub Jain
,
Sophie Bridgers
3mo
3
1
14
Is weak-to-strong generalization an alignment technique?
Q
cloud
2mo
Q
1