This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Sycophancy
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Sycophancy
Random Tag
Contributors
Posts tagged
Sycophancy
Most Relevant
1
90
Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison
,
Evan Hubinger
6mo
21
2
49
Steering Llama-2 with contrastive activation additions
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
Evan Hubinger
,
Alex Turner
1y
23
0
52
Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
1y
4
Review
1
12
SAE features for refusal and sycophancy steering vectors
neverix
,
Dmitrii Kharlapenko
,
Arthur Conmy
,
Neel Nanda
2mo
0
0
3
Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga
3mo
0