This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Sycophancy
•
Applied to
SAE features for refusal and sycophancy steering vectors
by
neverix
2mo
ago
•
Applied to
Evaluating LLaMA 3 for political sycophancy
by
alma.liezenga
3mo
ago
•
Applied to
Two new datasets for evaluating political sycophancy in LLMs
by
alma.liezenga
3mo
ago
•
Applied to
Sycophancy to subterfuge: Investigating reward tampering in large language models
by
Raymond Arnold
6mo
ago
•
Applied to
Antagonistic AI
by
Xybermancer
10mo
ago
•
Applied to
Steering Llama-2 with contrastive activation additions
by
Alex Turner
1y
ago
•
Applied to
Reducing sycophancy and improving honesty via activation steering
by
Maxime Riché
1y
ago
•
Created by
Maxime Riché
at
1y