This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Scalable Oversight
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Scalable Oversight
Random Tag
Contributors
Posts tagged
Scalable Oversight
Most Relevant
2
60
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud
,
Jacob G-W
,
Evžen Wybitul
,
Joseph Miller
,
Alex Turner
11d
2
1
57
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
8mo
7
1
43
Scalable oversight as a quantitative rather than qualitative problem
Buck Shlegeris
5mo
8
1
15
Inference-Only Debate Experiments Using Math Problems
Arjun Panickssery
,
Abhimanyu Pallavi Sudhir
,
JacksonKaunismaa
4mo
0
2
13
AXRP Episode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization
DanielFilan
4mo
0
1
22
On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
,
Rohin Shah
5mo
18