This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Wikitags
Sycophancy
This page is a stub.
Subscribe
1
Subscribe
1
Discussion
0
Discussion
0
Posts tagged
Sycophancy
Most Relevant
90
Sycophancy to subterfuge: Investigating reward tampering in large language models
Carson Denison
,
evhub
1y
21
49
Steering Llama-2 with contrastive activation additions
Nina Panickssery
,
Wuschel Schulz
,
NickGabs
,
Meg
,
evhub
,
TurnTrout
2y
23
52
Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
2y
4
13
SAE features for refusal and sycophancy steering vectors
neverix
,
Dmitrii Kharlapenko
,
Arthur Conmy
,
Neel Nanda
11mo
0
3
Two new datasets for evaluating political sycophancy in LLMs
alma.liezenga
1y
0